Overview

Dataset statistics

Number of variables28
Number of observations903653
Missing cells6389257
Missing cells (%)25.3%
Duplicate rows38245
Duplicate rows (%)4.2%
Total size in memory671.2 MiB
Average record size in memory778.8 B

Variable types

Categorical17
Numeric6
Text2
Boolean1
Path1
DateTime1

Alerts

Dataset has 38245 (4.2%) duplicate rowsDuplicates
country has a high cardinality: 222 distinct valuesHigh cardinality
region has a high cardinality: 376 distinct valuesHigh cardinality
metro has a high cardinality: 94 distinct valuesHigh cardinality
city has a high cardinality: 649 distinct valuesHigh cardinality
source has a high cardinality: 380 distinct valuesHigh cardinality
keyword has a high cardinality: 3636 distinct valuesHigh cardinality
adContent is highly overall correlated with campaign and 3 other fieldsHigh correlation
adwordsClickInfo.adNetworkType is highly overall correlated with channelGrouping and 1 other fieldsHigh correlation
adwordsClickInfo.page is highly overall correlated with channelGrouping and 1 other fieldsHigh correlation
adwordsClickInfo.slot is highly overall correlated with channelGrouping and 1 other fieldsHigh correlation
campaign is highly overall correlated with adContent and 2 other fieldsHigh correlation
channelGrouping is highly overall correlated with adContent and 5 other fieldsHigh correlation
continent is highly overall correlated with subContinentHigh correlation
date is highly overall correlated with adContentHigh correlation
deviceCategory is highly overall correlated with isMobile and 1 other fieldsHigh correlation
hits is highly overall correlated with pageviewsHigh correlation
isMobile is highly overall correlated with deviceCategory and 1 other fieldsHigh correlation
medium is highly overall correlated with adContent and 5 other fieldsHigh correlation
operatingSystem is highly overall correlated with deviceCategory and 1 other fieldsHigh correlation
pageviews is highly overall correlated with hitsHigh correlation
subContinent is highly overall correlated with continentHigh correlation
region is highly imbalanced (61.3%)Imbalance
metro is highly imbalanced (68.6%)Imbalance
city is highly imbalanced (60.4%)Imbalance
campaign is highly imbalanced (90.3%)Imbalance
source is highly imbalanced (71.8%)Imbalance
keyword is highly imbalanced (92.3%)Imbalance
adwordsClickInfo.slot is highly imbalanced (83.9%)Imbalance
adwordsClickInfo.adNetworkType is highly imbalanced (99.6%)Imbalance
conversion is highly imbalanced (90.2%)Imbalance
transactionRevenue has 892138 (98.7%) missing valuesMissing
keyword has 502929 (55.7%) missing valuesMissing
referralPath has 572712 (63.4%) missing valuesMissing
adwordsClickInfo.page has 882193 (97.6%) missing valuesMissing
adwordsClickInfo.slot has 882193 (97.6%) missing valuesMissing
adwordsClickInfo.gclId has 882092 (97.6%) missing valuesMissing
adwordsClickInfo.adNetworkType has 882193 (97.6%) missing valuesMissing
adContent has 892707 (98.8%) missing valuesMissing
transactionRevenue is highly skewed (γ1 = 25.7227026)Skewed
adwordsClickInfo.page is highly skewed (γ1 = 40.17090183)Skewed

Reproduction

Analysis started2024-02-20 13:13:40.124676
Analysis finished2024-02-20 13:15:20.019628
Duration1 minute and 39.89 seconds
Software versionydata-profiling vv4.6.4
Download configurationconfig.json

Variables

channelGrouping
Categorical

HIGH CORRELATION 

Distinct8
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size57.6 MiB
Organic Search
381561 
Social
226117 
Direct
143026 
Referral
104838 
Paid Search
 
25326
Other values (3)
 
22785

Length

Max length14
Median length11
Mean length9.8297754
Min length6

Characters and Unicode

Total characters8882706
Distinct characters25
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowOrganic Search
2nd rowOrganic Search
3rd rowOrganic Search
4th rowOrganic Search
5th rowOrganic Search

Common Values

ValueCountFrequency (%)
Organic Search 381561
42.2%
Social 226117
25.0%
Direct 143026
 
15.8%
Referral 104838
 
11.6%
Paid Search 25326
 
2.8%
Affiliates 16403
 
1.8%
Display 6262
 
0.7%
(Other) 120
 
< 0.1%

Length

2024-02-20T08:15:20.061422image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-02-20T08:15:20.135823image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
ValueCountFrequency (%)
search 406887
31.0%
organic 381561
29.1%
social 226117
17.3%
direct 143026
 
10.9%
referral 104838
 
8.0%
paid 25326
 
1.9%
affiliates 16403
 
1.3%
display 6262
 
0.5%
other 120
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
a 1167394
13.1%
c 1157591
13.0%
r 1141270
12.8%
i 815098
9.2%
e 776112
8.7%
S 633004
 
7.1%
h 407007
 
4.6%
406887
 
4.6%
O 381681
 
4.3%
g 381561
 
4.3%
Other values (15) 1615101
18.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 7165039
80.7%
Uppercase Letter 1310540
 
14.8%
Space Separator 406887
 
4.6%
Open Punctuation 120
 
< 0.1%
Close Punctuation 120
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 1167394
16.3%
c 1157591
16.2%
r 1141270
15.9%
i 815098
11.4%
e 776112
10.8%
h 407007
 
5.7%
g 381561
 
5.3%
n 381561
 
5.3%
l 353620
 
4.9%
o 226117
 
3.2%
Other values (6) 357708
 
5.0%
Uppercase Letter
ValueCountFrequency (%)
S 633004
48.3%
O 381681
29.1%
D 149288
 
11.4%
R 104838
 
8.0%
P 25326
 
1.9%
A 16403
 
1.3%
Space Separator
ValueCountFrequency (%)
406887
100.0%
Open Punctuation
ValueCountFrequency (%)
( 120
100.0%
Close Punctuation
ValueCountFrequency (%)
) 120
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 8475579
95.4%
Common 407127
 
4.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 1167394
13.8%
c 1157591
13.7%
r 1141270
13.5%
i 815098
9.6%
e 776112
9.2%
S 633004
7.5%
h 407007
 
4.8%
O 381681
 
4.5%
g 381561
 
4.5%
n 381561
 
4.5%
Other values (12) 1233300
14.6%
Common
ValueCountFrequency (%)
406887
99.9%
( 120
 
< 0.1%
) 120
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 8882706
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a 1167394
13.1%
c 1157591
13.0%
r 1141270
12.8%
i 815098
9.2%
e 776112
8.7%
S 633004
 
7.1%
h 407007
 
4.6%
406887
 
4.6%
O 381681
 
4.3%
g 381561
 
4.3%
Other values (15) 1615101
18.2%

date
Real number (ℝ)

HIGH CORRELATION 

Distinct366
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean20165885
Minimum20160801
Maximum20170801
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size6.9 MiB
2024-02-20T08:15:20.212638image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/

Quantile statistics

Minimum20160801
5-th percentile20160819
Q120161027
median20170109
Q320170421
95-th percentile20170713
Maximum20170801
Range10000
Interquartile range (IQR)9394

Descriptive statistics

Standard deviation4697.6976
Coefficient of variation (CV)0.00023295271
Kurtosis-1.9901402
Mean20165885
Median Absolute Deviation (MAD)617
Skewness-0.066677208
Sum1.8222963 × 1013
Variance22068362
MonotonicityNot monotonic
2024-02-20T08:15:20.286468image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
20161128 4807
 
0.5%
20161115 4685
 
0.5%
20161114 4466
 
0.5%
20161130 4435
 
0.5%
20161026 4375
 
0.5%
20161129 4337
 
0.5%
20161116 4334
 
0.5%
20161004 4322
 
0.5%
20161205 4265
 
0.5%
20170426 4224
 
0.5%
Other values (356) 859403
95.1%
ValueCountFrequency (%)
20160801 1711
0.2%
20160802 2140
0.2%
20160803 2890
0.3%
20160804 3161
0.3%
20160805 2702
0.3%
20160806 1663
0.2%
20160807 1622
0.2%
20160808 2815
0.3%
20160809 2851
0.3%
20160810 2757
0.3%
ValueCountFrequency (%)
20170801 2556
0.3%
20170731 2620
0.3%
20170730 1799
0.2%
20170729 1597
0.2%
20170728 2433
0.3%
20170727 2529
0.3%
20170726 2725
0.3%
20170725 2631
0.3%
20170724 2436
0.3%
20170723 1966
0.2%

visitNumber
Real number (ℝ)

Distinct384
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.264897
Minimum1
Maximum395
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size6.9 MiB
2024-02-20T08:15:20.361071image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q11
median1
Q31
95-th percentile5
Maximum395
Range394
Interquartile range (IQR)0

Descriptive statistics

Standard deviation9.2837345
Coefficient of variation (CV)4.0989654
Kurtosis517.3079
Mean2.264897
Median Absolute Deviation (MAD)0
Skewness19.998064
Sum2046681
Variance86.187726
MonotonicityNot monotonic
2024-02-20T08:15:20.432316image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1 703060
77.8%
2 92548
 
10.2%
3 35843
 
4.0%
4 19157
 
2.1%
5 11615
 
1.3%
6 7677
 
0.8%
7 5413
 
0.6%
8 4031
 
0.4%
9 3084
 
0.3%
10 2415
 
0.3%
Other values (374) 18810
 
2.1%
ValueCountFrequency (%)
1 703060
77.8%
2 92548
 
10.2%
3 35843
 
4.0%
4 19157
 
2.1%
5 11615
 
1.3%
6 7677
 
0.8%
7 5413
 
0.6%
8 4031
 
0.4%
9 3084
 
0.3%
10 2415
 
0.3%
ValueCountFrequency (%)
395 1
< 0.1%
394 1
< 0.1%
393 1
< 0.1%
391 1
< 0.1%
390 1
< 0.1%
389 1
< 0.1%
388 1
< 0.1%
387 1
< 0.1%
386 1
< 0.1%
385 1
< 0.1%

continent
Categorical

HIGH CORRELATION 

Distinct6
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size54.7 MiB
Americas
450377 
Asia
223698 
Europe
198311 
Oceania
 
15054
Africa
 
14745

Length

Max length9
Median length8
Mean length6.5232274
Min length4

Characters and Unicode

Total characters5894734
Distinct characters19
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowAsia
2nd rowOceania
3rd rowEurope
4th rowAsia
5th rowEurope

Common Values

ValueCountFrequency (%)
Americas 450377
49.8%
Asia 223698
24.8%
Europe 198311
21.9%
Oceania 15054
 
1.7%
Africa 14745
 
1.6%
(not set) 1468
 
0.2%

Length

2024-02-20T08:15:20.648097image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-02-20T08:15:20.713182image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
ValueCountFrequency (%)
americas 450377
49.8%
asia 223698
24.7%
europe 198311
21.9%
oceania 15054
 
1.7%
africa 14745
 
1.6%
not 1468
 
0.2%
set 1468
 
0.2%

Most occurring characters

ValueCountFrequency (%)
a 718928
12.2%
i 703874
11.9%
A 688820
11.7%
s 675543
11.5%
e 665210
11.3%
r 663433
11.3%
c 480176
8.1%
m 450377
7.6%
o 199779
 
3.4%
p 198311
 
3.4%
Other values (9) 450283
7.6%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 4988145
84.6%
Uppercase Letter 902185
 
15.3%
Open Punctuation 1468
 
< 0.1%
Space Separator 1468
 
< 0.1%
Close Punctuation 1468
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 718928
14.4%
i 703874
14.1%
s 675543
13.5%
e 665210
13.3%
r 663433
13.3%
c 480176
9.6%
m 450377
9.0%
o 199779
 
4.0%
p 198311
 
4.0%
u 198311
 
4.0%
Other values (3) 34203
 
0.7%
Uppercase Letter
ValueCountFrequency (%)
A 688820
76.4%
E 198311
 
22.0%
O 15054
 
1.7%
Open Punctuation
ValueCountFrequency (%)
( 1468
100.0%
Space Separator
ValueCountFrequency (%)
1468
100.0%
Close Punctuation
ValueCountFrequency (%)
) 1468
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 5890330
99.9%
Common 4404
 
0.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 718928
12.2%
i 703874
11.9%
A 688820
11.7%
s 675543
11.5%
e 665210
11.3%
r 663433
11.3%
c 480176
8.2%
m 450377
7.6%
o 199779
 
3.4%
p 198311
 
3.4%
Other values (6) 445879
7.6%
Common
ValueCountFrequency (%)
( 1468
33.3%
1468
33.3%
) 1468
33.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 5894734
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a 718928
12.2%
i 703874
11.9%
A 688820
11.7%
s 675543
11.5%
e 665210
11.3%
r 663433
11.3%
c 480176
8.1%
m 450377
7.6%
o 199779
 
3.4%
p 198311
 
3.4%
Other values (9) 450283
7.6%

subContinent
Categorical

HIGH CORRELATION 

Distinct23
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size61.7 MiB
Northern America
390657 
Southeast Asia
77800 
Southern Asia
59321 
Western Europe
59114 
Northern Europe
58168 
Other values (18)
258593 

Length

Max length18
Median length16
Mean length14.621631
Min length9

Characters and Unicode

Total characters13212881
Distinct characters31
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowWestern Asia
2nd rowAustralasia
3rd rowSouthern Europe
4th rowSoutheast Asia
5th rowNorthern Europe

Common Values

ValueCountFrequency (%)
Northern America 390657
43.2%
Southeast Asia 77800
 
8.6%
Southern Asia 59321
 
6.6%
Western Europe 59114
 
6.5%
Northern Europe 58168
 
6.4%
Eastern Asia 46919
 
5.2%
Eastern Europe 45249
 
5.0%
South America 41731
 
4.6%
Western Asia 38443
 
4.3%
Southern Europe 35780
 
4.0%
Other values (13) 50471
 
5.6%

Length

2024-02-20T08:15:20.773786image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
northern 456508
25.5%
america 447971
25.0%
asia 223698
12.5%
europe 198311
11.1%
western 100130
 
5.6%
southern 97270
 
5.4%
eastern 94095
 
5.3%
southeast 77800
 
4.3%
south 41731
 
2.3%
central 16798
 
0.9%
Other values (10) 35589
 
2.0%

Most occurring characters

ValueCountFrequency (%)
r 1899690
14.4%
e 1593577
12.1%
t 979961
 
7.4%
a 924840
 
7.0%
886248
 
6.7%
o 873223
 
6.6%
n 768946
 
5.8%
i 704377
 
5.3%
A 701307
 
5.3%
h 673309
 
5.1%
Other values (21) 3207403
24.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 10536732
79.7%
Uppercase Letter 1786965
 
13.5%
Space Separator 886248
 
6.7%
Open Punctuation 1468
 
< 0.1%
Close Punctuation 1468
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
r 1899690
18.0%
e 1593577
15.1%
t 979961
9.3%
a 924840
8.8%
o 873223
8.3%
n 768946
7.3%
i 704377
 
6.7%
h 673309
 
6.4%
s 527138
 
5.0%
c 462771
 
4.4%
Other values (9) 1128900
10.7%
Uppercase Letter
ValueCountFrequency (%)
A 701307
39.2%
N 456508
25.5%
E 292406
16.4%
S 216801
 
12.1%
W 100130
 
5.6%
C 19204
 
1.1%
M 529
 
< 0.1%
R 55
 
< 0.1%
P 25
 
< 0.1%
Space Separator
ValueCountFrequency (%)
886248
100.0%
Open Punctuation
ValueCountFrequency (%)
( 1468
100.0%
Close Punctuation
ValueCountFrequency (%)
) 1468
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 12323697
93.3%
Common 889184
 
6.7%

Most frequent character per script

Latin
ValueCountFrequency (%)
r 1899690
15.4%
e 1593577
12.9%
t 979961
 
8.0%
a 924840
 
7.5%
o 873223
 
7.1%
n 768946
 
6.2%
i 704377
 
5.7%
A 701307
 
5.7%
h 673309
 
5.5%
s 527138
 
4.3%
Other values (18) 2677329
21.7%
Common
ValueCountFrequency (%)
886248
99.7%
( 1468
 
0.2%
) 1468
 
0.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 13212881
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
r 1899690
14.4%
e 1593577
12.1%
t 979961
 
7.4%
a 924840
 
7.0%
886248
 
6.7%
o 873223
 
6.6%
n 768946
 
5.8%
i 704377
 
5.3%
A 701307
 
5.3%
h 673309
 
5.1%
Other values (21) 3207403
24.3%

country
Categorical

HIGH CARDINALITY 

Distinct222
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.7 MiB
United-States
364744 
India
51140 
United-Kingdom
 
37393
Canada
 
25869
Vietnam
 
24598
Other values (217)
399909 

Length

Max length24
Median length22
Mean length9.7230386
Min length4

Characters and Unicode

Total characters8786253
Distinct characters62
Distinct categories7 ?
Distinct scripts2 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique10 ?
Unique (%)< 0.1%

Sample

1st rowTurkey
2nd rowAustralia
3rd rowSpain
4th rowIndonesia
5th rowUnited-Kingdom

Common Values

ValueCountFrequency (%)
United-States 364744
40.4%
India 51140
 
5.7%
United-Kingdom 37393
 
4.1%
Canada 25869
 
2.9%
Vietnam 24598
 
2.7%
Turkey 20522
 
2.3%
Thailand 20123
 
2.2%
Germany 19980
 
2.2%
Brazil 19783
 
2.2%
Japan 19731
 
2.2%
Other values (212) 299770
33.2%

Length

2024-02-20T08:15:20.841167image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
united-states 364744
40.4%
india 51140
 
5.7%
united-kingdom 37393
 
4.1%
canada 25869
 
2.9%
vietnam 24598
 
2.7%
turkey 20522
 
2.3%
thailand 20123
 
2.2%
germany 19980
 
2.2%
brazil 19783
 
2.2%
japan 19731
 
2.2%
Other values (212) 299770
33.2%

Most occurring characters

ValueCountFrequency (%)
t 1238439
14.1%
e 1021620
11.6%
a 965182
11.0%
n 792306
9.0%
i 766000
8.7%
d 601015
 
6.8%
s 464391
 
5.3%
- 435615
 
5.0%
U 411769
 
4.7%
S 410463
 
4.7%
Other values (52) 1679453
19.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 7005686
79.7%
Uppercase Letter 1338417
 
15.2%
Dash Punctuation 435615
 
5.0%
Open Punctuation 2510
 
< 0.1%
Close Punctuation 2510
 
< 0.1%
Other Punctuation 1195
 
< 0.1%
Final Punctuation 320
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
t 1238439
17.7%
e 1021620
14.6%
a 965182
13.8%
n 792306
11.3%
i 766000
10.9%
d 601015
8.6%
s 464391
 
6.6%
r 198000
 
2.8%
l 151162
 
2.2%
o 146869
 
2.1%
Other values (21) 660702
9.4%
Uppercase Letter
ValueCountFrequency (%)
U 411769
30.8%
S 410463
30.7%
I 85110
 
6.4%
T 55494
 
4.1%
K 50288
 
3.8%
C 44495
 
3.3%
A 32529
 
2.4%
P 32433
 
2.4%
B 31573
 
2.4%
V 26788
 
2.0%
Other values (15) 157475
 
11.8%
Other Punctuation
ValueCountFrequency (%)
& 1106
92.6%
. 89
 
7.4%
Dash Punctuation
ValueCountFrequency (%)
- 435615
100.0%
Open Punctuation
ValueCountFrequency (%)
( 2510
100.0%
Close Punctuation
ValueCountFrequency (%)
) 2510
100.0%
Final Punctuation
ValueCountFrequency (%)
320
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 8344103
95.0%
Common 442150
 
5.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
t 1238439
14.8%
e 1021620
12.2%
a 965182
11.6%
n 792306
9.5%
i 766000
9.2%
d 601015
7.2%
s 464391
 
5.6%
U 411769
 
4.9%
S 410463
 
4.9%
r 198000
 
2.4%
Other values (46) 1474918
17.7%
Common
ValueCountFrequency (%)
- 435615
98.5%
( 2510
 
0.6%
) 2510
 
0.6%
& 1106
 
0.3%
320
 
0.1%
. 89
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 8785433
> 99.9%
None 500
 
< 0.1%
Punctuation 320
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
t 1238439
14.1%
e 1021620
11.6%
a 965182
11.0%
n 792306
9.0%
i 766000
8.7%
d 601015
 
6.8%
s 464391
 
5.3%
- 435615
 
5.0%
U 411769
 
4.7%
S 410463
 
4.7%
Other values (45) 1678633
19.1%
None
ValueCountFrequency (%)
ô 320
64.0%
é 147
29.4%
ç 30
 
6.0%
ã 1
 
0.2%
í 1
 
0.2%
Å 1
 
0.2%
Punctuation
ValueCountFrequency (%)
320
100.0%

region
Categorical

HIGH CARDINALITY  IMBALANCE 

Distinct376
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.8 MiB
not-available-in-demo-dataset
508229 
California
107495 
(not-set)
 
27827
New-York
 
26433
England
 
13198
Other values (371)
220471 

Length

Max length33
Median length29
Mean length20.613868
Min length4

Characters and Unicode

Total characters18627784
Distinct characters55
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowIzmir
2nd rownot-available-in-demo-dataset
3rd rowCommunity-of-Madrid
4th rownot-available-in-demo-dataset
5th rownot-available-in-demo-dataset

Common Values

ValueCountFrequency (%)
not-available-in-demo-dataset 508229
56.2%
California 107495
 
11.9%
(not-set) 27827
 
3.1%
New-York 26433
 
2.9%
England 13198
 
1.5%
Texas 8749
 
1.0%
Bangkok 7709
 
0.9%
Washington 7642
 
0.8%
Illinois 7585
 
0.8%
Ho-Chi-Minh 7250
 
0.8%
Other values (366) 181536
 
20.1%

Length

2024-02-20T08:15:20.918530image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
not-available-in-demo-dataset 508229
56.2%
california 107495
 
11.9%
not-set 27827
 
3.1%
new-york 26433
 
2.9%
england 13198
 
1.5%
texas 8749
 
1.0%
bangkok 7709
 
0.9%
washington 7642
 
0.8%
illinois 7585
 
0.8%
ho-chi-minh 7250
 
0.8%
Other values (366) 181536
 
20.1%

Most occurring characters

ValueCountFrequency (%)
a 3037450
16.3%
- 2226484
12.0%
e 1709137
9.2%
t 1704294
9.1%
i 1434438
7.7%
o 1338634
7.2%
n 1327976
7.1%
l 1227746
6.6%
d 1063436
 
5.7%
s 629109
 
3.4%
Other values (45) 2929080
15.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 15832688
85.0%
Dash Punctuation 2226484
 
12.0%
Uppercase Letter 512694
 
2.8%
Close Punctuation 27827
 
0.1%
Open Punctuation 27827
 
0.1%
Other Punctuation 264
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 3037450
19.2%
e 1709137
10.8%
t 1704294
10.8%
i 1434438
9.1%
o 1338634
8.5%
n 1327976
8.4%
l 1227746
7.8%
d 1063436
 
6.7%
s 629109
 
4.0%
m 532538
 
3.4%
Other values (16) 1827930
11.5%
Uppercase Letter
ValueCountFrequency (%)
C 145934
28.5%
N 43828
 
8.5%
M 36782
 
7.2%
T 35969
 
7.0%
Y 26465
 
5.2%
D 21969
 
4.3%
S 21727
 
4.2%
I 19813
 
3.9%
B 19351
 
3.8%
H 17118
 
3.3%
Other values (15) 123738
24.1%
Dash Punctuation
ValueCountFrequency (%)
- 2226484
100.0%
Close Punctuation
ValueCountFrequency (%)
) 27827
100.0%
Open Punctuation
ValueCountFrequency (%)
( 27827
100.0%
Other Punctuation
ValueCountFrequency (%)
' 264
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 16345382
87.7%
Common 2282402
 
12.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 3037450
18.6%
e 1709137
10.5%
t 1704294
10.4%
i 1434438
8.8%
o 1338634
8.2%
n 1327976
8.1%
l 1227746
7.5%
d 1063436
 
6.5%
s 629109
 
3.8%
m 532538
 
3.3%
Other values (41) 2340624
14.3%
Common
ValueCountFrequency (%)
- 2226484
97.6%
) 27827
 
1.2%
( 27827
 
1.2%
' 264
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 18627784
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a 3037450
16.3%
- 2226484
12.0%
e 1709137
9.2%
t 1704294
9.1%
i 1434438
7.7%
o 1338634
7.2%
n 1327976
7.1%
l 1227746
6.6%
d 1063436
 
5.7%
s 629109
 
3.4%
Other values (45) 2929080
15.7%

metro
Categorical

HIGH CARDINALITY  IMBALANCE 

Distinct94
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size891.4 KiB
not-available-in-demo-dataset
508229 
(not-set)
201766 
San-Francisco-Oakland-San-Jose-CA
95913 
New-York-NY
 
26917
London
 
12571
Other values (89)
58257 

Length

Max length41
Median length29
Mean length23.187242
Min length6

Characters and Unicode

Total characters20953221
Distinct characters56
Distinct categories7 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row(not-set)
2nd rownot-available-in-demo-dataset
3rd row(not-set)
4th rownot-available-in-demo-dataset
5th rownot-available-in-demo-dataset

Common Values

ValueCountFrequency (%)
not-available-in-demo-dataset 508229
56.2%
(not-set) 201766
 
22.3%
San-Francisco-Oakland-San-Jose-CA 95913
 
10.6%
New-York-NY 26917
 
3.0%
London 12571
 
1.4%
Los-Angeles-CA 9995
 
1.1%
Seattle-Tacoma-WA 7642
 
0.8%
Chicago-IL 7585
 
0.8%
Austin-TX 3790
 
0.4%
Washington-DC-(Hagerstown-MD) 3380
 
0.4%
Other values (84) 25865
 
2.9%

Length

2024-02-20T08:15:20.992006image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
not-available-in-demo-dataset 508229
56.2%
not-set 201766
 
22.3%
san-francisco-oakland-san-jose-ca 95913
 
10.6%
new-york-ny 26917
 
3.0%
london 12571
 
1.4%
los-angeles-ca 9995
 
1.1%
seattle-tacoma-wa 7642
 
0.8%
chicago-il 7585
 
0.8%
austin-tx 3790
 
0.4%
washington-dc-(hagerstown-md 3380
 
0.4%
Other values (84) 25865
 
2.9%

Most occurring characters

ValueCountFrequency (%)
a 3084714
14.7%
- 2869030
13.7%
t 1983594
9.5%
e 1908676
9.1%
n 1674938
8.0%
o 1522326
7.3%
l 1144979
 
5.5%
i 1137893
 
5.4%
d 1129116
 
5.4%
s 947069
 
4.5%
Other values (46) 3550886
16.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 16654529
79.5%
Dash Punctuation 2869030
 
13.7%
Uppercase Letter 1015958
 
4.8%
Close Punctuation 205397
 
1.0%
Open Punctuation 205397
 
1.0%
Other Punctuation 2593
 
< 0.1%
Connector Punctuation 317
 
< 0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
S 201944
19.9%
A 142147
14.0%
C 122747
12.1%
O 99783
9.8%
F 98863
9.7%
J 96257
9.5%
N 58712
 
5.8%
Y 54047
 
5.3%
L 33820
 
3.3%
T 17187
 
1.7%
Other values (15) 90451
8.9%
Lowercase Letter
ValueCountFrequency (%)
a 3084714
18.5%
t 1983594
11.9%
e 1908676
11.5%
n 1674938
10.1%
o 1522326
9.1%
l 1144979
 
6.9%
i 1137893
 
6.8%
d 1129116
 
6.8%
s 947069
 
5.7%
m 517042
 
3.1%
Other values (14) 1604182
9.6%
Other Punctuation
ValueCountFrequency (%)
. 2520
97.2%
& 61
 
2.4%
, 12
 
0.5%
Dash Punctuation
ValueCountFrequency (%)
- 2869030
100.0%
Close Punctuation
ValueCountFrequency (%)
) 205397
100.0%
Open Punctuation
ValueCountFrequency (%)
( 205397
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 317
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 17670487
84.3%
Common 3282734
 
15.7%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 3084714
17.5%
t 1983594
11.2%
e 1908676
10.8%
n 1674938
9.5%
o 1522326
8.6%
l 1144979
 
6.5%
i 1137893
 
6.4%
d 1129116
 
6.4%
s 947069
 
5.4%
m 517042
 
2.9%
Other values (39) 2620140
14.8%
Common
ValueCountFrequency (%)
- 2869030
87.4%
) 205397
 
6.3%
( 205397
 
6.3%
. 2520
 
0.1%
_ 317
 
< 0.1%
& 61
 
< 0.1%
, 12
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 20953221
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a 3084714
14.7%
- 2869030
13.7%
t 1983594
9.5%
e 1908676
9.1%
n 1674938
8.0%
o 1522326
7.3%
l 1144979
 
5.5%
i 1137893
 
5.4%
d 1129116
 
5.4%
s 947069
 
4.5%
Other values (46) 3550886
16.9%

city
Categorical

HIGH CARDINALITY  IMBALANCE 

Distinct649
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size1.8 MiB
not-available-in-demo-dataset
508229 
Mountain-View
 
40884
(not-set)
 
34262
New-York
 
26371
San-Francisco
 
20329
Other values (644)
273578 

Length

Max length33
Median length29
Mean length20.198846
Min length3

Characters and Unicode

Total characters18252748
Distinct characters57
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowIzmir
2nd rownot-available-in-demo-dataset
3rd rowMadrid
4th rownot-available-in-demo-dataset
5th rownot-available-in-demo-dataset

Common Values

ValueCountFrequency (%)
not-available-in-demo-dataset 508229
56.2%
Mountain-View 40884
 
4.5%
(not-set) 34262
 
3.8%
New-York 26371
 
2.9%
San-Francisco 20329
 
2.2%
Sunnyvale 13086
 
1.4%
London 12607
 
1.4%
San-Jose 10295
 
1.1%
Los-Angeles 8670
 
1.0%
Bangkok 7709
 
0.9%
Other values (639) 221211
24.5%

Length

2024-02-20T08:15:21.066581image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
not-available-in-demo-dataset 508229
56.2%
mountain-view 40884
 
4.5%
not-set 34262
 
3.8%
new-york 26371
 
2.9%
san-francisco 20329
 
2.2%
sunnyvale 13086
 
1.4%
london 12607
 
1.4%
san-jose 10295
 
1.1%
los-angeles 8670
 
1.0%
bangkok 7709
 
0.9%
Other values (639) 221211
24.5%

Most occurring characters

ValueCountFrequency (%)
a 2851125
15.6%
- 2248004
12.3%
e 1772459
9.7%
t 1723610
9.4%
n 1388588
7.6%
o 1327491
7.3%
i 1247087
6.8%
l 1121717
 
6.1%
d 1065724
 
5.8%
s 638731
 
3.5%
Other values (47) 2868212
15.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 15395189
84.3%
Dash Punctuation 2248004
 
12.3%
Uppercase Letter 540915
 
3.0%
Open Punctuation 34262
 
0.2%
Close Punctuation 34262
 
0.2%
Other Punctuation 116
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 2851125
18.5%
e 1772459
11.5%
t 1723610
11.2%
n 1388588
9.0%
o 1327491
8.6%
i 1247087
8.1%
l 1121717
 
7.3%
d 1065724
 
6.9%
s 638731
 
4.1%
b 545505
 
3.5%
Other values (17) 1713152
11.1%
Uppercase Letter
ValueCountFrequency (%)
S 77212
14.3%
M 73829
13.6%
V 44431
 
8.2%
C 42122
 
7.8%
A 34771
 
6.4%
N 32069
 
5.9%
Y 30238
 
5.6%
B 29697
 
5.5%
L 27718
 
5.1%
H 25297
 
4.7%
Other values (15) 123531
22.8%
Other Punctuation
ValueCountFrequency (%)
' 92
79.3%
. 24
 
20.7%
Dash Punctuation
ValueCountFrequency (%)
- 2248004
100.0%
Open Punctuation
ValueCountFrequency (%)
( 34262
100.0%
Close Punctuation
ValueCountFrequency (%)
) 34262
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 15936104
87.3%
Common 2316644
 
12.7%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 2851125
17.9%
e 1772459
11.1%
t 1723610
10.8%
n 1388588
8.7%
o 1327491
8.3%
i 1247087
7.8%
l 1121717
 
7.0%
d 1065724
 
6.7%
s 638731
 
4.0%
b 545505
 
3.4%
Other values (42) 2254067
14.1%
Common
ValueCountFrequency (%)
- 2248004
97.0%
( 34262
 
1.5%
) 34262
 
1.5%
' 92
 
< 0.1%
. 24
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 18252664
> 99.9%
None 84
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a 2851125
15.6%
- 2248004
12.3%
e 1772459
9.7%
t 1723610
9.4%
n 1388588
7.6%
o 1327491
7.3%
i 1247087
6.8%
l 1121717
 
6.1%
d 1065724
 
5.8%
s 638731
 
3.5%
Other values (46) 2868128
15.7%
None
ValueCountFrequency (%)
ã 84
100.0%
Distinct54
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size54.7 MiB
2024-02-20T08:15:21.130332image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/

Length

Max length43
Median length6
Mean length6.460627
Min length1

Characters and Unicode

Total characters5838165
Distinct characters68
Distinct categories9 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique14 ?
Unique (%)< 0.1%

Sample

1st rowChrome
2nd rowFirefox
3rd rowChrome
4th rowUC Browser
5th rowChrome
ValueCountFrequency (%)
chrome 620365
65.4%
safari 189095
 
19.9%
firefox 37069
 
3.9%
internet 19375
 
2.0%
explorer 19375
 
2.0%
opera 11782
 
1.2%
edge 10205
 
1.1%
android 8420
 
0.9%
webview 7865
 
0.8%
in-app 6850
 
0.7%
Other values (60) 18727
 
2.0%
2024-02-20T08:15:21.265021image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
r 935843
16.0%
e 759760
13.0%
o 693831
11.9%
C 624899
10.7%
m 621325
10.6%
h 620632
10.6%
a 400787
6.9%
i 263237
 
4.5%
f 226356
 
3.9%
S 189714
 
3.2%
Other values (58) 501781
8.6%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 4822831
82.6%
Uppercase Letter 948991
 
16.3%
Space Separator 45475
 
0.8%
Dash Punctuation 6866
 
0.1%
Open Punctuation 6859
 
0.1%
Close Punctuation 6859
 
0.1%
Decimal Number 201
 
< 0.1%
Connector Punctuation 82
 
< 0.1%
Other Punctuation 1
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
r 935843
19.4%
e 759760
15.8%
o 693831
14.4%
m 621325
12.9%
h 620632
12.9%
a 400787
8.3%
i 263237
 
5.5%
f 226356
 
4.7%
n 61814
 
1.3%
x 56690
 
1.2%
Other values (16) 182556
 
3.8%
Uppercase Letter
ValueCountFrequency (%)
C 624899
65.8%
S 189714
 
20.0%
F 37094
 
3.9%
E 29851
 
3.1%
I 19410
 
2.0%
O 12046
 
1.3%
A 9371
 
1.0%
W 7865
 
0.8%
M 7320
 
0.8%
B 5654
 
0.6%
Other values (14) 5767
 
0.6%
Decimal Number
ValueCountFrequency (%)
2 63
31.3%
0 59
29.4%
1 29
14.4%
4 24
 
11.9%
7 10
 
5.0%
5 6
 
3.0%
3 4
 
2.0%
9 4
 
2.0%
8 1
 
0.5%
6 1
 
0.5%
Open Punctuation
ValueCountFrequency (%)
( 6858
> 99.9%
[ 1
 
< 0.1%
Close Punctuation
ValueCountFrequency (%)
) 6858
> 99.9%
] 1
 
< 0.1%
Space Separator
ValueCountFrequency (%)
45475
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 6866
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 82
100.0%
Other Punctuation
ValueCountFrequency (%)
: 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 5771822
98.9%
Common 66343
 
1.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
r 935843
16.2%
e 759760
13.2%
o 693831
12.0%
C 624899
10.8%
m 621325
10.8%
h 620632
10.8%
a 400787
6.9%
i 263237
 
4.6%
f 226356
 
3.9%
S 189714
 
3.3%
Other values (40) 435438
7.5%
Common
ValueCountFrequency (%)
45475
68.5%
- 6866
 
10.3%
( 6858
 
10.3%
) 6858
 
10.3%
_ 82
 
0.1%
2 63
 
0.1%
0 59
 
0.1%
1 29
 
< 0.1%
4 24
 
< 0.1%
7 10
 
< 0.1%
Other values (8) 19
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 5838165
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
r 935843
16.0%
e 759760
13.0%
o 693831
11.9%
C 624899
10.7%
m 621325
10.6%
h 620632
10.6%
a 400787
6.9%
i 263237
 
4.5%
f 226356
 
3.9%
S 189714
 
3.2%
Other values (58) 501781
8.6%

operatingSystem
Categorical

HIGH CORRELATION 

Distinct20
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size55.2 MiB
Windows
350072 
Macintosh
253938 
Android
123892 
iOS
107665 
Linux
35034 
Other values (15)
 
33052

Length

Max length13
Median length7
Mean length7.0862532
Min length3

Characters and Unicode

Total characters6403514
Distinct characters41
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique3 ?
Unique (%)< 0.1%

Sample

1st rowWindows
2nd rowMacintosh
3rd rowWindows
4th rowLinux
5th rowAndroid

Common Values

ValueCountFrequency (%)
Windows 350072
38.7%
Macintosh 253938
28.1%
Android 123892
 
13.7%
iOS 107665
 
11.9%
Linux 35034
 
3.9%
Chrome OS 26337
 
2.9%
(not set) 4695
 
0.5%
Windows Phone 1216
 
0.1%
Samsung 280
 
< 0.1%
BlackBerry 218
 
< 0.1%
Other values (10) 306
 
< 0.1%

Length

2024-02-20T08:15:21.338631image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
windows 351288
37.5%
macintosh 253938
27.1%
android 123892
 
13.2%
ios 107665
 
11.5%
linux 35034
 
3.7%
os 26426
 
2.8%
chrome 26337
 
2.8%
not 4695
 
0.5%
set 4695
 
0.5%
phone 1216
 
0.1%
Other values (14) 941
 
0.1%

Most occurring characters

ValueCountFrequency (%)
i 872314
13.6%
n 770618
12.0%
o 761662
11.9%
s 610201
9.5%
d 599208
9.4%
W 351423
 
5.5%
w 351288
 
5.5%
h 281491
 
4.4%
t 263464
 
4.1%
a 254438
 
4.0%
Other values (31) 1287407
20.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 5300528
82.8%
Uppercase Letter 1061121
 
16.6%
Space Separator 32474
 
0.5%
Open Punctuation 4695
 
0.1%
Close Punctuation 4695
 
0.1%
Decimal Number 1
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
i 872314
16.5%
n 770618
14.5%
o 761662
14.4%
s 610201
11.5%
d 599208
11.3%
w 351288
6.6%
h 281491
 
5.3%
t 263464
 
5.0%
a 254438
 
4.8%
c 254156
 
4.8%
Other values (12) 281688
 
5.3%
Uppercase Letter
ValueCountFrequency (%)
W 351423
33.1%
M 253939
23.9%
S 134385
 
12.7%
O 134094
 
12.6%
A 123892
 
11.7%
L 35034
 
3.3%
C 26338
 
2.5%
P 1216
 
0.1%
B 447
 
< 0.1%
N 139
 
< 0.1%
Other values (5) 214
 
< 0.1%
Space Separator
ValueCountFrequency (%)
32474
100.0%
Open Punctuation
ValueCountFrequency (%)
( 4695
100.0%
Close Punctuation
ValueCountFrequency (%)
) 4695
100.0%
Decimal Number
ValueCountFrequency (%)
3 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 6361649
99.3%
Common 41865
 
0.7%

Most frequent character per script

Latin
ValueCountFrequency (%)
i 872314
13.7%
n 770618
12.1%
o 761662
12.0%
s 610201
9.6%
d 599208
9.4%
W 351423
 
5.5%
w 351288
 
5.5%
h 281491
 
4.4%
t 263464
 
4.1%
a 254438
 
4.0%
Other values (27) 1245542
19.6%
Common
ValueCountFrequency (%)
32474
77.6%
( 4695
 
11.2%
) 4695
 
11.2%
3 1
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 6403514
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
i 872314
13.6%
n 770618
12.0%
o 761662
11.9%
s 610201
9.5%
d 599208
9.4%
W 351423
 
5.5%
w 351288
 
5.5%
h 281491
 
4.4%
t 263464
 
4.1%
a 254438
 
4.0%
Other values (31) 1287407
20.1%

isMobile
Boolean

HIGH CORRELATION 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size882.6 KiB
False
664530 
True
239123 
ValueCountFrequency (%)
False 664530
73.5%
True 239123
 
26.5%
2024-02-20T08:15:21.395046image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/

deviceCategory
Categorical

HIGH CORRELATION 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size54.9 MiB
desktop
664479 
mobile
208725 
tablet
 
30449

Length

Max length7
Median length7
Mean length6.7353254
Min length6

Characters and Unicode

Total characters6086397
Distinct characters12
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowdesktop
2nd rowdesktop
3rd rowdesktop
4th rowdesktop
5th rowmobile

Common Values

ValueCountFrequency (%)
desktop 664479
73.5%
mobile 208725
 
23.1%
tablet 30449
 
3.4%

Length

2024-02-20T08:15:21.445442image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-02-20T08:15:21.505065image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
ValueCountFrequency (%)
desktop 664479
73.5%
mobile 208725
 
23.1%
tablet 30449
 
3.4%

Most occurring characters

ValueCountFrequency (%)
e 903653
14.8%
o 873204
14.3%
t 725377
11.9%
d 664479
10.9%
s 664479
10.9%
k 664479
10.9%
p 664479
10.9%
b 239174
 
3.9%
l 239174
 
3.9%
m 208725
 
3.4%
Other values (2) 239174
 
3.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 6086397
100.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 903653
14.8%
o 873204
14.3%
t 725377
11.9%
d 664479
10.9%
s 664479
10.9%
k 664479
10.9%
p 664479
10.9%
b 239174
 
3.9%
l 239174
 
3.9%
m 208725
 
3.4%
Other values (2) 239174
 
3.9%

Most occurring scripts

ValueCountFrequency (%)
Latin 6086397
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 903653
14.8%
o 873204
14.3%
t 725377
11.9%
d 664479
10.9%
s 664479
10.9%
k 664479
10.9%
p 664479
10.9%
b 239174
 
3.9%
l 239174
 
3.9%
m 208725
 
3.4%
Other values (2) 239174
 
3.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII 6086397
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 903653
14.8%
o 873204
14.3%
t 725377
11.9%
d 664479
10.9%
s 664479
10.9%
k 664479
10.9%
p 664479
10.9%
b 239174
 
3.9%
l 239174
 
3.9%
m 208725
 
3.4%
Other values (2) 239174
 
3.9%

hits
Real number (ℝ)

HIGH CORRELATION 

Distinct274
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4.5965376
Minimum1
Maximum500
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size6.9 MiB
2024-02-20T08:15:21.562627image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q11
median2
Q34
95-th percentile18
Maximum500
Range499
Interquartile range (IQR)3

Descriptive statistics

Standard deviation9.6414371
Coefficient of variation (CV)2.0975434
Kurtosis230.35198
Mean4.5965376
Median Absolute Deviation (MAD)1
Skewness9.7804556
Sum4153675
Variance92.957308
MonotonicityNot monotonic
2024-02-20T08:15:21.634692image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1 446754
49.4%
2 137952
 
15.3%
3 70402
 
7.8%
4 42444
 
4.7%
5 30939
 
3.4%
6 23918
 
2.6%
7 19518
 
2.2%
8 15484
 
1.7%
9 12959
 
1.4%
10 10640
 
1.2%
Other values (264) 92643
 
10.3%
ValueCountFrequency (%)
1 446754
49.4%
2 137952
 
15.3%
3 70402
 
7.8%
4 42444
 
4.7%
5 30939
 
3.4%
6 23918
 
2.6%
7 19518
 
2.2%
8 15484
 
1.7%
9 12959
 
1.4%
10 10640
 
1.2%
ValueCountFrequency (%)
500 10
< 0.1%
489 1
 
< 0.1%
483 1
 
< 0.1%
471 1
 
< 0.1%
445 1
 
< 0.1%
437 1
 
< 0.1%
406 1
 
< 0.1%
387 1
 
< 0.1%
386 1
 
< 0.1%
385 2
 
< 0.1%

pageviews
Real number (ℝ)

HIGH CORRELATION 

Distinct213
Distinct (%)< 0.1%
Missing100
Missing (%)< 0.1%
Infinite0
Infinite (%)0.0%
Mean3.8497642
Minimum1
Maximum469
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size6.9 MiB
2024-02-20T08:15:21.705083image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q11
median1
Q34
95-th percentile15
Maximum469
Range468
Interquartile range (IQR)3

Descriptive statistics

Standard deviation7.025274
Coefficient of variation (CV)1.8248582
Kurtosis237.41488
Mean3.8497642
Median Absolute Deviation (MAD)0
Skewness9.2150555
Sum3478466
Variance49.354474
MonotonicityNot monotonic
2024-02-20T08:15:21.773545image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1 452522
50.1%
2 143770
 
15.9%
3 73835
 
8.2%
4 45192
 
5.0%
5 33411
 
3.7%
6 24688
 
2.7%
7 19476
 
2.2%
8 15272
 
1.7%
9 12585
 
1.4%
10 10104
 
1.1%
Other values (203) 72698
 
8.0%
ValueCountFrequency (%)
1 452522
50.1%
2 143770
 
15.9%
3 73835
 
8.2%
4 45192
 
5.0%
5 33411
 
3.7%
6 24688
 
2.7%
7 19476
 
2.2%
8 15272
 
1.7%
9 12585
 
1.4%
10 10104
 
1.1%
ValueCountFrequency (%)
469 1
< 0.1%
466 1
< 0.1%
431 1
< 0.1%
429 1
< 0.1%
400 1
< 0.1%
358 1
< 0.1%
351 1
< 0.1%
343 1
< 0.1%
341 2
< 0.1%
340 1
< 0.1%

transactionRevenue
Real number (ℝ)

MISSING  SKEWED 

Distinct5332
Distinct (%)46.3%
Missing892138
Missing (%)98.7%
Infinite0
Infinite (%)0.0%
Mean1.3374479 × 108
Minimum10000
Maximum2.31295 × 1010
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size6.9 MiB
2024-02-20T08:15:21.842561image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/

Quantile statistics

Minimum10000
5-th percentile10630000
Q124930000
median49450000
Q31.07655 × 108
95-th percentile4.92576 × 108
Maximum2.31295 × 1010
Range2.312949 × 1010
Interquartile range (IQR)82725000

Descriptive statistics

Standard deviation4.4828523 × 108
Coefficient of variation (CV)3.3517959
Kurtosis1020.3068
Mean1.3374479 × 108
Median Absolute Deviation (MAD)30470000
Skewness25.722703
Sum1.5400712 × 1012
Variance2.0095965 × 1017
MonotonicityNot monotonic
2024-02-20T08:15:21.915027image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
16990000 256
 
< 0.1%
18990000 189
 
< 0.1%
33590000 187
 
< 0.1%
44790000 170
 
< 0.1%
13590000 135
 
< 0.1%
55990000 122
 
< 0.1%
19990000 116
 
< 0.1%
15990000 98
 
< 0.1%
15190000 93
 
< 0.1%
19190000 92
 
< 0.1%
Other values (5322) 10057
 
1.1%
(Missing) 892138
98.7%
ValueCountFrequency (%)
10000 1
 
< 0.1%
40000 1
 
< 0.1%
90000 1
 
< 0.1%
160000 1
 
< 0.1%
200000 1
 
< 0.1%
490000 1
 
< 0.1%
770000 1
 
< 0.1%
790000 1
 
< 0.1%
990000 1
 
< 0.1%
1200000 7
< 0.1%
ValueCountFrequency (%)
2.31295 × 10101
< 0.1%
1.78555 × 10101
< 0.1%
1.602375 × 10101
< 0.1%
1.058914 × 10101
< 0.1%
8677830000 1
< 0.1%
8248800000 1
< 0.1%
6996500000 1
< 0.1%
6826960000 1
< 0.1%
6248750000 1
< 0.1%
5614440000 1
< 0.1%

campaign
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct10
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size57.4 MiB
(not set)
865347 
Data Share Promo
 
16403
AW - Dynamic Search Ads Whole Site
 
14244
AW - Accessories
 
7070
test-liyuhz
 
392
Other values (5)
 
197

Length

Max length47
Median length9
Mean length9.5797779
Min length9

Characters and Unicode

Total characters8656795
Distinct characters34
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st row(not set)
2nd row(not set)
3rd row(not set)
4th row(not set)
5th row(not set)

Common Values

ValueCountFrequency (%)
(not set) 865347
95.8%
Data Share Promo 16403
 
1.8%
AW - Dynamic Search Ads Whole Site 14244
 
1.6%
AW - Accessories 7070
 
0.8%
test-liyuhz 392
 
< 0.1%
AW - Electronics 96
 
< 0.1%
Retail (DO NOT EDIT owners nophakun and tianyu) 50
 
< 0.1%
AW - Apparel 46
 
< 0.1%
All Products 4
 
< 0.1%
Data Share 1
 
< 0.1%

Length

2024-02-20T08:15:21.982357image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-02-20T08:15:22.055039image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
ValueCountFrequency (%)
not 865397
45.5%
set 865347
45.5%
aw 21456
 
1.1%
21456
 
1.1%
data 16404
 
0.9%
share 16404
 
0.9%
promo 16403
 
0.9%
dynamic 14244
 
0.7%
search 14244
 
0.7%
ads 14244
 
0.7%
Other values (15) 36450
 
1.9%

Most occurring characters

ValueCountFrequency (%)
t 1762326
20.4%
998396
11.5%
e 939257
10.8%
o 919667
10.6%
s 901343
10.4%
n 879937
10.2%
) 865397
10.0%
( 865397
10.0%
a 77946
 
0.9%
r 54317
 
0.6%
Other values (24) 392812
 
4.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 5734694
66.2%
Space Separator 998396
 
11.5%
Close Punctuation 865397
 
10.0%
Open Punctuation 865397
 
10.0%
Uppercase Letter 171063
 
2.0%
Dash Punctuation 21848
 
0.3%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
t 1762326
30.7%
e 939257
16.4%
o 919667
16.0%
s 901343
15.7%
n 879937
15.3%
a 77946
 
1.4%
r 54317
 
0.9%
h 45334
 
0.8%
c 42824
 
0.7%
i 36146
 
0.6%
Other values (9) 75597
 
1.3%
Uppercase Letter
ValueCountFrequency (%)
S 44892
26.2%
A 42820
25.0%
W 35700
20.9%
D 30748
18.0%
P 16407
 
9.6%
E 146
 
0.1%
O 100
 
0.1%
T 100
 
0.1%
R 50
 
< 0.1%
N 50
 
< 0.1%
Space Separator
ValueCountFrequency (%)
998396
100.0%
Close Punctuation
ValueCountFrequency (%)
) 865397
100.0%
Open Punctuation
ValueCountFrequency (%)
( 865397
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 21848
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 5905757
68.2%
Common 2751038
31.8%

Most frequent character per script

Latin
ValueCountFrequency (%)
t 1762326
29.8%
e 939257
15.9%
o 919667
15.6%
s 901343
15.3%
n 879937
14.9%
a 77946
 
1.3%
r 54317
 
0.9%
h 45334
 
0.8%
S 44892
 
0.8%
c 42824
 
0.7%
Other values (20) 237914
 
4.0%
Common
ValueCountFrequency (%)
998396
36.3%
) 865397
31.5%
( 865397
31.5%
- 21848
 
0.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII 8656795
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
t 1762326
20.4%
998396
11.5%
e 939257
10.8%
o 919667
10.6%
s 901343
10.4%
n 879937
10.2%
) 865397
10.0%
( 865397
10.0%
a 77946
 
0.9%
r 54317
 
0.6%
Other values (24) 392812
 
4.5%

source
Categorical

HIGH CARDINALITY  IMBALANCE 

Distinct380
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.8 MiB
google
400788 
youtube.com
212602 
(direct)
143028 
mall.googleplex.com
66416 
Partners
 
16411
Other values (375)
64408 

Length

Max length60
Median length49
Mean length9.0206938
Min length3

Characters and Unicode

Total characters8151577
Distinct characters42
Distinct categories7 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique125 ?
Unique (%)< 0.1%

Sample

1st rowgoogle
2nd rowgoogle
3rd rowgoogle
4th rowgoogle
5th rowgoogle

Common Values

ValueCountFrequency (%)
google 400788
44.4%
youtube.com 212602
23.5%
(direct) 143028
 
15.8%
mall.googleplex.com 66416
 
7.3%
Partners 16411
 
1.8%
analytics.google.com 16172
 
1.8%
dfa 5686
 
0.6%
google.com 4669
 
0.5%
m.facebook.com 3365
 
0.4%
baidu 3356
 
0.4%
Other values (370) 31160
 
3.4%

Length

2024-02-20T08:15:22.140058image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
google 400788
44.4%
youtube.com 212602
23.5%
direct 143028
 
15.8%
mall.googleplex.com 66416
 
7.3%
partners 16411
 
1.8%
analytics.google.com 16172
 
1.8%
dfa 5686
 
0.6%
google.com 4669
 
0.5%
m.facebook.com 3365
 
0.4%
baidu 3356
 
0.4%
Other values (370) 31160
 
3.4%

Most occurring characters

ValueCountFrequency (%)
o 1571376
19.3%
g 1006376
12.3%
e 958629
11.8%
l 730101
9.0%
c 500801
 
6.1%
u 437502
 
5.4%
. 434839
 
5.3%
t 402952
 
4.9%
m 401095
 
4.9%
y 233355
 
2.9%
Other values (32) 1474551
18.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 7412657
90.9%
Other Punctuation 434933
 
5.3%
Open Punctuation 143097
 
1.8%
Close Punctuation 143097
 
1.8%
Uppercase Letter 16411
 
0.2%
Decimal Number 971
 
< 0.1%
Dash Punctuation 411
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
o 1571376
21.2%
g 1006376
13.6%
e 958629
12.9%
l 730101
9.8%
c 500801
 
6.8%
u 437502
 
5.9%
t 402952
 
5.4%
m 401095
 
5.4%
y 233355
 
3.1%
b 228412
 
3.1%
Other values (16) 942058
12.7%
Decimal Number
ValueCountFrequency (%)
0 305
31.4%
2 273
28.1%
8 132
13.6%
9 97
 
10.0%
3 39
 
4.0%
1 38
 
3.9%
5 31
 
3.2%
6 24
 
2.5%
4 20
 
2.1%
7 12
 
1.2%
Other Punctuation
ValueCountFrequency (%)
. 434839
> 99.9%
: 94
 
< 0.1%
Open Punctuation
ValueCountFrequency (%)
( 143097
100.0%
Close Punctuation
ValueCountFrequency (%)
) 143097
100.0%
Uppercase Letter
ValueCountFrequency (%)
P 16411
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 411
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 7429068
91.1%
Common 722509
 
8.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
o 1571376
21.2%
g 1006376
13.5%
e 958629
12.9%
l 730101
9.8%
c 500801
 
6.7%
u 437502
 
5.9%
t 402952
 
5.4%
m 401095
 
5.4%
y 233355
 
3.1%
b 228412
 
3.1%
Other values (17) 958469
12.9%
Common
ValueCountFrequency (%)
. 434839
60.2%
( 143097
 
19.8%
) 143097
 
19.8%
- 411
 
0.1%
0 305
 
< 0.1%
2 273
 
< 0.1%
8 132
 
< 0.1%
9 97
 
< 0.1%
: 94
 
< 0.1%
3 39
 
< 0.1%
Other values (5) 125
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 8151577
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
o 1571376
19.3%
g 1006376
12.3%
e 958629
11.8%
l 730101
9.0%
c 500801
 
6.1%
u 437502
 
5.4%
. 434839
 
5.3%
t 402952
 
4.9%
m 401095
 
4.9%
y 233355
 
2.9%
Other values (32) 1474551
18.1%

medium
Categorical

HIGH CORRELATION 

Distinct7
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size55.2 MiB
organic
381561 
referral
330955 
(none)
143026 
cpc
 
25326
affiliate
 
16403
Other values (2)
 
6382

Length

Max length9
Median length8
Mean length7.1047117
Min length3

Characters and Unicode

Total characters6420194
Distinct characters17
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st roworganic
2nd roworganic
3rd roworganic
4th roworganic
5th roworganic

Common Values

ValueCountFrequency (%)
organic 381561
42.2%
referral 330955
36.6%
(none) 143026
 
15.8%
cpc 25326
 
2.8%
affiliate 16403
 
1.8%
cpm 6262
 
0.7%
(not set) 120
 
< 0.1%

Length

2024-02-20T08:15:22.206212image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-02-20T08:15:22.271764image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
ValueCountFrequency (%)
organic 381561
42.2%
referral 330955
36.6%
none 143026
 
15.8%
cpc 25326
 
2.8%
affiliate 16403
 
1.8%
cpm 6262
 
0.7%
not 120
 
< 0.1%
set 120
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
r 1374426
21.4%
e 821459
12.8%
a 745322
11.6%
n 667733
10.4%
o 524707
 
8.2%
c 438475
 
6.8%
i 414367
 
6.5%
g 381561
 
5.9%
f 363761
 
5.7%
l 347358
 
5.4%
Other values (7) 341025
 
5.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 6133782
95.5%
Open Punctuation 143146
 
2.2%
Close Punctuation 143146
 
2.2%
Space Separator 120
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
r 1374426
22.4%
e 821459
13.4%
a 745322
12.2%
n 667733
10.9%
o 524707
 
8.6%
c 438475
 
7.1%
i 414367
 
6.8%
g 381561
 
6.2%
f 363761
 
5.9%
l 347358
 
5.7%
Other values (4) 54613
 
0.9%
Open Punctuation
ValueCountFrequency (%)
( 143146
100.0%
Close Punctuation
ValueCountFrequency (%)
) 143146
100.0%
Space Separator
ValueCountFrequency (%)
120
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 6133782
95.5%
Common 286412
 
4.5%

Most frequent character per script

Latin
ValueCountFrequency (%)
r 1374426
22.4%
e 821459
13.4%
a 745322
12.2%
n 667733
10.9%
o 524707
 
8.6%
c 438475
 
7.1%
i 414367
 
6.8%
g 381561
 
6.2%
f 363761
 
5.9%
l 347358
 
5.7%
Other values (4) 54613
 
0.9%
Common
ValueCountFrequency (%)
( 143146
50.0%
) 143146
50.0%
120
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 6420194
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
r 1374426
21.4%
e 821459
12.8%
a 745322
11.6%
n 667733
10.4%
o 524707
 
8.2%
c 438475
 
6.8%
i 414367
 
6.5%
g 381561
 
5.9%
f 363761
 
5.7%
l 347358
 
5.4%
Other values (7) 341025
 
5.3%

keyword
Categorical

HIGH CARDINALITY  IMBALANCE  MISSING 

Distinct3636
Distinct (%)0.9%
Missing502929
Missing (%)55.7%
Memory size2.1 MiB
(not-provided)
366363 
6qEhsCssdK0z36ri
 
11503
(Remarketing/Content-targeting)
 
2298
1hZbAqLCbjwfgOH7
 
2264
google-merchandise-store
 
2209
Other values (3631)
 
16087

Length

Max length147
Median length14
Mean length14.332388
Min length1

Characters and Unicode

Total characters5743332
Distinct characters294
Distinct categories17 ?
Distinct scripts13 ?
Distinct blocks14 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2742 ?
Unique (%)0.7%

Sample

1st row(not-provided)
2nd row(not-provided)
3rd row(not-provided)
4th rowgoogle-+-online
5th row(not-provided)

Common Values

ValueCountFrequency (%)
(not-provided) 366363
40.5%
6qEhsCssdK0z36ri 11503
 
1.3%
(Remarketing/Content-targeting) 2298
 
0.3%
1hZbAqLCbjwfgOH7 2264
 
0.3%
google-merchandise-store 2209
 
0.2%
Google-Merchandise 1648
 
0.2%
google-store 1277
 
0.1%
youtube 568
 
0.1%
(User-vertical-targeting) 489
 
0.1%
1X4Me6ZKNV0zg-jV 467
 
0.1%
Other values (3626) 11638
 
1.3%
(Missing) 502929
55.7%

Length

2024-02-20T08:15:22.352575image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
not-provided 366363
91.4%
6qehscssdk0z36ri 11503
 
2.9%
remarketing/content-targeting 2298
 
0.6%
google-merchandise-store 2266
 
0.6%
1hzbaqlcbjwfgoh7 2264
 
0.6%
google-merchandise 1904
 
0.5%
google-store 1325
 
0.3%
youtube 845
 
0.2%
user-vertical-targeting 489
 
0.1%
1x4me6zknv0zg-jv 467
 
0.1%
Other values (3348) 11000
 
2.7%

Most occurring characters

ValueCountFrequency (%)
o 772784
13.5%
d 752095
13.1%
e 416576
 
7.3%
r 401124
 
7.0%
t 397351
 
6.9%
i 395175
 
6.9%
- 392656
 
6.8%
n 386835
 
6.7%
( 369878
 
6.4%
) 369877
 
6.4%
Other values (284) 1088981
19.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 4480669
78.0%
Dash Punctuation 392658
 
6.8%
Open Punctuation 369878
 
6.4%
Close Punctuation 369877
 
6.4%
Uppercase Letter 66215
 
1.2%
Decimal Number 53129
 
0.9%
Other Punctuation 5413
 
0.1%
Math Symbol 4796
 
0.1%
Other Letter 496
 
< 0.1%
Connector Punctuation 99
 
< 0.1%
Other values (7) 102
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
38
 
7.7%
37
 
7.5%
34
 
6.9%
32
 
6.5%
17
 
3.4%
16
 
3.2%
13
 
2.6%
12
 
2.4%
12
 
2.4%
11
 
2.2%
Other values (118) 274
55.2%
Lowercase Letter
ValueCountFrequency (%)
o 772784
17.2%
d 752095
16.8%
e 416576
9.3%
r 401124
9.0%
t 397351
8.9%
i 395175
8.8%
n 386835
8.6%
p 368763
8.2%
v 367210
8.2%
s 52440
 
1.2%
Other values (71) 170316
 
3.8%
Uppercase Letter
ValueCountFrequency (%)
C 16135
24.4%
K 12009
18.1%
E 11748
17.7%
G 3090
 
4.7%
M 2992
 
4.5%
Z 2732
 
4.1%
A 2494
 
3.8%
R 2472
 
3.7%
O 2455
 
3.7%
H 2376
 
3.6%
Other values (18) 7712
11.6%
Other Punctuation
ValueCountFrequency (%)
/ 3604
66.6%
. 1181
 
21.8%
: 332
 
6.1%
& 144
 
2.7%
? 43
 
0.8%
' 33
 
0.6%
* 26
 
0.5%
, 17
 
0.3%
" 15
 
0.3%
# 7
 
0.1%
Other values (4) 11
 
0.2%
Spacing Mark
ValueCountFrequency (%)
29
44.6%
14
21.5%
ি 8
 
12.3%
3
 
4.6%
3
 
4.6%
2
 
3.1%
2
 
3.1%
1
 
1.5%
1
 
1.5%
1
 
1.5%
Decimal Number
ValueCountFrequency (%)
6 23566
44.4%
0 12053
22.7%
3 11550
21.7%
1 2936
 
5.5%
7 2288
 
4.3%
4 507
 
1.0%
2 127
 
0.2%
5 53
 
0.1%
9 32
 
0.1%
8 17
 
< 0.1%
Nonspacing Mark
ValueCountFrequency (%)
9
31.0%
5
17.2%
5
17.2%
3
 
10.3%
2
 
6.9%
2
 
6.9%
1
 
3.4%
1
 
3.4%
1
 
3.4%
Dash Punctuation
ValueCountFrequency (%)
- 392656
> 99.9%
1
 
< 0.1%
1
 
< 0.1%
Math Symbol
ValueCountFrequency (%)
+ 4577
95.4%
= 219
 
4.6%
Open Punctuation
ValueCountFrequency (%)
( 369878
100.0%
Close Punctuation
ValueCountFrequency (%)
) 369877
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 99
100.0%
Format
ValueCountFrequency (%)
3
100.0%
Modifier Letter
ValueCountFrequency (%)
2
100.0%
Modifier Symbol
ValueCountFrequency (%)
¨ 1
100.0%
Other Symbol
ValueCountFrequency (%)
👉 1
100.0%
Currency Symbol
ValueCountFrequency (%)
$ 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 4546613
79.2%
Common 1195858
 
20.8%
Cyrillic 227
 
< 0.1%
Bengali 190
 
< 0.1%
Han 186
 
< 0.1%
Katakana 71
 
< 0.1%
Devanagari 69
 
< 0.1%
Arabic 50
 
< 0.1%
Greek 44
 
< 0.1%
Hangul 10
 
< 0.1%
Other values (3) 14
 
< 0.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
o 772784
17.0%
d 752095
16.5%
e 416576
9.2%
r 401124
8.8%
t 397351
8.7%
i 395175
8.7%
n 386835
8.5%
p 368763
8.1%
v 367210
8.1%
s 52440
 
1.2%
Other values (59) 236260
 
5.2%
Common
ValueCountFrequency (%)
- 392656
32.8%
( 369878
30.9%
) 369877
30.9%
6 23566
 
2.0%
0 12053
 
1.0%
3 11550
 
1.0%
+ 4577
 
0.4%
/ 3604
 
0.3%
1 2936
 
0.2%
7 2288
 
0.2%
Other values (27) 2873
 
0.2%
Bengali
ValueCountFrequency (%)
29
15.3%
17
 
8.9%
16
 
8.4%
14
 
7.4%
9
 
4.7%
9
 
4.7%
ি 8
 
4.2%
8
 
4.2%
7
 
3.7%
6
 
3.2%
Other values (25) 67
35.3%
Han
ValueCountFrequency (%)
38
20.4%
37
19.9%
34
18.3%
32
17.2%
11
 
5.9%
10
 
5.4%
2
 
1.1%
1
 
0.5%
1
 
0.5%
1
 
0.5%
Other values (19) 19
10.2%
Cyrillic
ValueCountFrequency (%)
а 21
 
9.3%
о 20
 
8.8%
г 20
 
8.8%
т 19
 
8.4%
р 16
 
7.0%
н 16
 
7.0%
у 13
 
5.7%
и 13
 
5.7%
е 11
 
4.8%
к 11
 
4.8%
Other values (15) 67
29.5%
Devanagari
ValueCountFrequency (%)
7
 
10.1%
7
 
10.1%
5
 
7.2%
5
 
7.2%
5
 
7.2%
4
 
5.8%
4
 
5.8%
4
 
5.8%
3
 
4.3%
3
 
4.3%
Other values (15) 22
31.9%
Katakana
ValueCountFrequency (%)
13
18.3%
12
16.9%
12
16.9%
5
 
7.0%
5
 
7.0%
3
 
4.2%
3
 
4.2%
2
 
2.8%
2
 
2.8%
2
 
2.8%
Other values (11) 12
16.9%
Arabic
ValueCountFrequency (%)
و 11
22.0%
ي 10
20.0%
ت 7
14.0%
ب 6
12.0%
ی 2
 
4.0%
س 2
 
4.0%
ج 2
 
4.0%
ل 2
 
4.0%
غ 2
 
4.0%
ک 1
 
2.0%
Other values (5) 5
10.0%
Greek
ValueCountFrequency (%)
ο 8
18.2%
τ 5
11.4%
α 5
11.4%
λ 5
11.4%
ι 3
 
6.8%
υ 3
 
6.8%
ε 3
 
6.8%
γ 3
 
6.8%
π 3
 
6.8%
ν 1
 
2.3%
Other values (5) 5
11.4%
Hangul
ValueCountFrequency (%)
1
10.0%
1
10.0%
1
10.0%
1
10.0%
1
10.0%
1
10.0%
1
10.0%
1
10.0%
1
10.0%
1
10.0%
Hebrew
ValueCountFrequency (%)
ס 2
28.6%
א 1
14.3%
ת 1
14.3%
ר 1
14.3%
י 1
14.3%
ק 1
14.3%
Hiragana
ValueCountFrequency (%)
1
25.0%
1
25.0%
1
25.0%
1
25.0%
Gurmukhi
ValueCountFrequency (%)
1
33.3%
1
33.3%
1
33.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 5742438
> 99.9%
Cyrillic 227
 
< 0.1%
Bengali 190
 
< 0.1%
CJK 186
 
< 0.1%
Katakana 73
 
< 0.1%
None 70
 
< 0.1%
Devanagari 69
 
< 0.1%
Arabic 50
 
< 0.1%
Hebrew 7
 
< 0.1%
Compat Jamo 6
 
< 0.1%
Other values (4) 16
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
o 772784
13.5%
d 752095
13.1%
e 416576
 
7.3%
r 401124
 
7.0%
t 397351
 
6.9%
i 395175
 
6.9%
- 392656
 
6.8%
n 386835
 
6.7%
( 369878
 
6.4%
) 369877
 
6.4%
Other values (73) 1088087
18.9%
CJK
ValueCountFrequency (%)
38
20.4%
37
19.9%
34
18.3%
32
17.2%
11
 
5.9%
10
 
5.4%
2
 
1.1%
1
 
0.5%
1
 
0.5%
1
 
0.5%
Other values (19) 19
10.2%
Bengali
ValueCountFrequency (%)
29
15.3%
17
 
8.9%
16
 
8.4%
14
 
7.4%
9
 
4.7%
9
 
4.7%
ি 8
 
4.2%
8
 
4.2%
7
 
3.7%
6
 
3.2%
Other values (25) 67
35.3%
Cyrillic
ValueCountFrequency (%)
а 21
 
9.3%
о 20
 
8.8%
г 20
 
8.8%
т 19
 
8.4%
р 16
 
7.0%
н 16
 
7.0%
у 13
 
5.7%
и 13
 
5.7%
е 11
 
4.8%
к 11
 
4.8%
Other values (15) 67
29.5%
Katakana
ValueCountFrequency (%)
13
17.8%
12
16.4%
12
16.4%
5
 
6.8%
5
 
6.8%
3
 
4.1%
3
 
4.1%
2
 
2.7%
2
 
2.7%
2
 
2.7%
Other values (12) 14
19.2%
Arabic
ValueCountFrequency (%)
و 11
22.0%
ي 10
20.0%
ت 7
14.0%
ب 6
12.0%
ی 2
 
4.0%
س 2
 
4.0%
ج 2
 
4.0%
ل 2
 
4.0%
غ 2
 
4.0%
ک 1
 
2.0%
Other values (5) 5
10.0%
None
ValueCountFrequency (%)
ο 8
 
11.4%
ñ 7
 
10.0%
τ 5
 
7.1%
α 5
 
7.1%
λ 5
 
7.1%
ι 3
 
4.3%
υ 3
 
4.3%
ε 3
 
4.3%
γ 3
 
4.3%
π 3
 
4.3%
Other values (24) 25
35.7%
Devanagari
ValueCountFrequency (%)
7
 
10.1%
7
 
10.1%
5
 
7.2%
5
 
7.2%
5
 
7.2%
4
 
5.8%
4
 
5.8%
4
 
5.8%
3
 
4.3%
3
 
4.3%
Other values (15) 22
31.9%
Punctuation
ValueCountFrequency (%)
3
60.0%
1
 
20.0%
1
 
20.0%
Hebrew
ValueCountFrequency (%)
ס 2
28.6%
א 1
14.3%
ת 1
14.3%
ר 1
14.3%
י 1
14.3%
ק 1
14.3%
Gurmukhi
ValueCountFrequency (%)
1
33.3%
1
33.3%
1
33.3%
Compat Jamo
ValueCountFrequency (%)
1
16.7%
1
16.7%
1
16.7%
1
16.7%
1
16.7%
1
16.7%
Hangul
ValueCountFrequency (%)
1
25.0%
1
25.0%
1
25.0%
1
25.0%
Hiragana
ValueCountFrequency (%)
1
25.0%
1
25.0%
1
25.0%
1
25.0%

referralPath
Path

MISSING 

Distinct1475
Distinct (%)0.4%
Missing572712
Missing (%)63.4%
Memory size39.5 MiB
/
75523 
/yt/about/
71036 
/analytics/web/
14620 
/yt/about/tr/
14599 
/yt/about/vi/
 
13753
Other values (1470)
141410 

Length

Max length270
Median length227
Mean length12.654307
Min length1

Characters and Unicode

Total characters4187829
Distinct characters80
Distinct categories8 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique616 ?
Unique (%)0.2%

Sample

1st row/
2nd row/
3rd row/corp/google.com/study/incentives/working-with-perks
4th row/od/Things-To-Do-in-Silicon-Valley/fl/How-To-Visit-the-Googleplex-the-Google-Head-Office-in-Mountain-View.htm
5th row/od/Things-To-Do-in-Silicon-Valley/fl/How-To-Visit-the-Googleplex-the-Google-Head-Office-in-Mountain-View.htm

Common Values

ValueCountFrequency (%)
/ 75523
 
8.4%
/yt/about/ 71036
 
7.9%
/analytics/web/ 14620
 
1.6%
/yt/about/tr/ 14599
 
1.6%
/yt/about/vi/ 13753
 
1.5%
/yt/about/es-419/ 12735
 
1.4%
/yt/about/pt-BR/ 12003
 
1.3%
/yt/about/th/ 11430
 
1.3%
/yt/about/ru/ 11193
 
1.2%
/yt/about/es/ 7092
 
0.8%
Other values (1465) 86957
 
9.6%
(Missing) 572712
63.4%

Length

2024-02-20T08:15:22.443614image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
75523
22.8%
yt/about 71036
21.5%
analytics/web 14620
 
4.4%
yt/about/tr 14599
 
4.4%
yt/about/vi 13753
 
4.2%
yt/about/es-419 12735
 
3.8%
yt/about/pt-br 12003
 
3.6%
yt/about/th 11430
 
3.5%
yt/about/ru 11193
 
3.4%
yt/about/es 7092
 
2.1%
Other values (1455) 86957
26.3%

Most occurring characters

ValueCountFrequency (%)
/ 986276
23.6%
t 539889
12.9%
o 313010
 
7.5%
a 299566
 
7.2%
u 248905
 
5.9%
b 239785
 
5.7%
y 235075
 
5.6%
e 137264
 
3.3%
- 116586
 
2.8%
i 108143
 
2.6%
Other values (70) 963330
23.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 2837830
67.8%
Other Punctuation 1002859
 
23.9%
Dash Punctuation 116586
 
2.8%
Uppercase Letter 113210
 
2.7%
Decimal Number 96366
 
2.3%
Connector Punctuation 19108
 
0.5%
Math Symbol 1862
 
< 0.1%
Other Letter 8
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
t 539889
19.0%
o 313010
11.0%
a 299566
10.6%
u 248905
8.8%
b 239785
8.4%
y 235075
8.3%
e 137264
 
4.8%
i 108143
 
3.8%
s 101202
 
3.6%
n 87445
 
3.1%
Other values (16) 527546
18.6%
Uppercase Letter
ValueCountFrequency (%)
B 20815
18.4%
G 13512
11.9%
R 13108
11.6%
T 10994
 
9.7%
V 6806
 
6.0%
H 5551
 
4.9%
W 4236
 
3.7%
I 4028
 
3.6%
D 3622
 
3.2%
M 3051
 
2.7%
Other values (16) 27487
24.3%
Decimal Number
ValueCountFrequency (%)
1 25651
26.6%
4 19573
20.3%
9 19453
20.2%
0 8132
 
8.4%
7 5765
 
6.0%
2 5212
 
5.4%
6 4462
 
4.6%
5 3207
 
3.3%
3 2764
 
2.9%
8 2147
 
2.2%
Other Letter
ValueCountFrequency (%)
س 2
25.0%
ف 1
12.5%
ي 1
12.5%
ل 1
12.5%
م 1
12.5%
ك 1
12.5%
ى 1
12.5%
Other Punctuation
ValueCountFrequency (%)
/ 986276
98.3%
. 15578
 
1.6%
% 836
 
0.1%
, 124
 
< 0.1%
: 44
 
< 0.1%
@ 1
 
< 0.1%
Math Symbol
ValueCountFrequency (%)
= 1829
98.2%
+ 32
 
1.7%
~ 1
 
0.1%
Dash Punctuation
ValueCountFrequency (%)
- 116586
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 19108
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 2951040
70.5%
Common 1236781
29.5%
Arabic 8
 
< 0.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
t 539889
18.3%
o 313010
10.6%
a 299566
10.2%
u 248905
 
8.4%
b 239785
 
8.1%
y 235075
 
8.0%
e 137264
 
4.7%
i 108143
 
3.7%
s 101202
 
3.4%
n 87445
 
3.0%
Other values (42) 640756
21.7%
Common
ValueCountFrequency (%)
/ 986276
79.7%
- 116586
 
9.4%
1 25651
 
2.1%
4 19573
 
1.6%
9 19453
 
1.6%
_ 19108
 
1.5%
. 15578
 
1.3%
0 8132
 
0.7%
7 5765
 
0.5%
2 5212
 
0.4%
Other values (11) 15447
 
1.2%
Arabic
ValueCountFrequency (%)
س 2
25.0%
ف 1
12.5%
ي 1
12.5%
ل 1
12.5%
م 1
12.5%
ك 1
12.5%
ى 1
12.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 4187821
> 99.9%
Arabic 8
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
/ 986276
23.6%
t 539889
12.9%
o 313010
 
7.5%
a 299566
 
7.2%
u 248905
 
5.9%
b 239785
 
5.7%
y 235075
 
5.6%
e 137264
 
3.3%
- 116586
 
2.8%
i 108143
 
2.6%
Other values (63) 963322
23.0%
Arabic
ValueCountFrequency (%)
س 2
25.0%
ف 1
12.5%
ي 1
12.5%
ل 1
12.5%
م 1
12.5%
ك 1
12.5%
ى 1
12.5%
Common prefix/
Unique stems1474
Unique names515
Unique extensions23
Unique directories1124
Unique anchors1
ValueCountFrequency (%)
/ 75523
 
8.4%
/yt/about/ 71036
 
7.9%
/analytics/web/ 14620
 
1.6%
/yt/about/tr/ 14599
 
1.6%
/yt/about/vi/ 13753
 
1.5%
/yt/about/es-419/ 12735
 
1.4%
/yt/about/pt-BR/ 12003
 
1.3%
/yt/about/th/ 11430
 
1.3%
/yt/about/ru/ 11193
 
1.2%
/yt/about/es/ 7092
 
0.8%
Other values (1465) 86957
 
9.6%
(Missing) 572712
63.4%
ValueCountFrequency (%)
/ 75523
 
8.4%
/yt/about/ 71036
 
7.9%
/analytics/web/ 14620
 
1.6%
/yt/about/tr/ 14599
 
1.6%
/yt/about/vi/ 13753
 
1.5%
/yt/about/es-419/ 12735
 
1.4%
/yt/about/pt-BR/ 12003
 
1.3%
/yt/about/th/ 11430
 
1.3%
/yt/about/ru/ 11193
 
1.2%
/yt/about/es/ 7092
 
0.8%
Other values (1464) 86957
 
9.6%
(Missing) 572712
63.4%
ValueCountFrequency (%)
304574
33.7%
using-the-logo.html 5284
 
0.6%
How-To-Visit-the-Googleplex-the-Google-Head-Office-in-Mountain-View.htm 2056
 
0.2%
index.html 2023
 
0.2%
c10b14f9a69ff71b1b7a 1784
 
0.2%
inpage_launch 1638
 
0.2%
alphabet-google-discounts 1118
 
0.1%
2145 1064
 
0.1%
Where-can-I-buy-a-stuffed-Go-language-gopher-mascot-online 872
 
0.1%
mobile 812
 
0.1%
Other values (505) 9716
 
1.1%
(Missing) 572712
63.4%
ValueCountFrequency (%)
319771
35.4%
.html 7848
 
0.9%
.htm 2059
 
0.2%
.php 724
 
0.1%
.aspx 190
 
< 0.1%
.jhtml 145
 
< 0.1%
.jspa 80
 
< 0.1%
.jsp 73
 
< 0.1%
.pdf 15
 
< 0.1%
.lai 8
 
< 0.1%
Other values (13) 28
 
< 0.1%
(Missing) 572712
63.4%
ValueCountFrequency (%)
/ 82812
 
9.2%
/yt/about 71408
 
7.9%
/analytics/web 16258
 
1.8%
/yt/about/tr 14665
 
1.6%
/yt/about/vi 13788
 
1.5%
/yt/about/es-419 12832
 
1.4%
/yt/about/pt-BR 12077
 
1.3%
/yt/about/th 11471
 
1.3%
/yt/about/ru 11294
 
1.2%
/yt/about/es 7143
 
0.8%
Other values (1114) 77193
 
8.5%
(Missing) 572712
63.4%
ValueCountFrequency (%)
330941
36.6%
(Missing) 572712
63.4%

adwordsClickInfo.page
Real number (ℝ)

HIGH CORRELATION  MISSING  SKEWED 

Distinct8
Distinct (%)< 0.1%
Missing882193
Missing (%)97.6%
Infinite0
Infinite (%)0.0%
Mean1.0081081
Minimum1
Maximum14
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size6.9 MiB
2024-02-20T08:15:22.502995image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q11
median1
Q31
95-th percentile1
Maximum14
Range13
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.17358392
Coefficient of variation (CV)0.1721878
Kurtosis2188.361
Mean1.0081081
Median Absolute Deviation (MAD)0
Skewness40.170902
Sum21634
Variance0.030131376
MonotonicityNot monotonic
2024-02-20T08:15:22.556502image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
Histogram with fixed size bins (bins=8)
ValueCountFrequency (%)
1 21362
 
2.4%
2 73
 
< 0.1%
3 10
 
< 0.1%
5 7
 
< 0.1%
7 3
 
< 0.1%
9 2
 
< 0.1%
4 2
 
< 0.1%
14 1
 
< 0.1%
(Missing) 882193
97.6%
ValueCountFrequency (%)
1 21362
2.4%
2 73
 
< 0.1%
3 10
 
< 0.1%
4 2
 
< 0.1%
5 7
 
< 0.1%
7 3
 
< 0.1%
9 2
 
< 0.1%
14 1
 
< 0.1%
ValueCountFrequency (%)
14 1
 
< 0.1%
9 2
 
< 0.1%
7 3
 
< 0.1%
5 7
 
< 0.1%
4 2
 
< 0.1%
3 10
 
< 0.1%
2 73
 
< 0.1%
1 21362
2.4%

adwordsClickInfo.slot
Categorical

HIGH CORRELATION  IMBALANCE  MISSING 

Distinct2
Distinct (%)< 0.1%
Missing882193
Missing (%)97.6%
Memory size34.9 MiB
Top
20956 
RHS
 
504

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters64380
Distinct characters6
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowTop
2nd rowTop
3rd rowTop
4th rowTop
5th rowTop

Common Values

ValueCountFrequency (%)
Top 20956
 
2.3%
RHS 504
 
0.1%
(Missing) 882193
97.6%

Length

2024-02-20T08:15:22.612120image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-02-20T08:15:22.673082image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
ValueCountFrequency (%)
top 20956
97.7%
rhs 504
 
2.3%

Most occurring characters

ValueCountFrequency (%)
T 20956
32.6%
o 20956
32.6%
p 20956
32.6%
R 504
 
0.8%
H 504
 
0.8%
S 504
 
0.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 41912
65.1%
Uppercase Letter 22468
34.9%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
T 20956
93.3%
R 504
 
2.2%
H 504
 
2.2%
S 504
 
2.2%
Lowercase Letter
ValueCountFrequency (%)
o 20956
50.0%
p 20956
50.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 64380
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
T 20956
32.6%
o 20956
32.6%
p 20956
32.6%
R 504
 
0.8%
H 504
 
0.8%
S 504
 
0.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII 64380
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
T 20956
32.6%
o 20956
32.6%
p 20956
32.6%
R 504
 
0.8%
H 504
 
0.8%
S 504
 
0.8%
Distinct17774
Distinct (%)82.4%
Missing882092
Missing (%)97.6%
Memory size29.5 MiB
2024-02-20T08:15:22.771613image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/

Length

Max length92
Median length91
Mean length69.235703
Min length26

Characters and Unicode

Total characters1492791
Distinct characters64
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique15688 ?
Unique (%)72.8%

Sample

1st rowCj0KEQjwxqS-BRDRgPLp0q2t0IUBEiQAgfMXRBVDYwnFawcmsrhs02pjO7FXPLhzHyvJFv53h1H4QJ8aAhtw8P8HAQ
2nd rowCj0KEQjwxqS-BRDRgPLp0q2t0IUBEiQAgfMXRAq0D2zir1iAiqwgFU0lcMGVY6qaqhBTOFSAIW7gM8saAiku8P8HAQ
3rd rowCj0KEQjwxqS-BRDRgPLp0q2t0IUBEiQAgfMXRMbhgNCALey5pPeCxitqlWsaKLtXW_EC8qRLRreq6OMaApJJ8P8HAQ
4th rowCj0KEQjwxqS-BRDRgPLp0q2t0IUBEiQAgfMXRBRI7rtb79aCyB-UUNNHh1V712wows-T-MlL9VW-8ZEaAhqd8P8HAQ
5th rowCj0KEQjwxqS-BRDRgPLp0q2t0IUBEiQAgfMXRDKcQOTkfRji3NxEErk_rDSPqc8VzHFSZnRcZBCoBOgaAgeG8P8HAQ
ValueCountFrequency (%)
cj0keqjwmirjbrcrmj_x7kdo-9obeiqauupkmufmpug3zdwyo8gtsjibfd5mphstza9y_9ncri8x97oaaglc8p8haq 70
 
0.3%
cj0keqjw1ee_brd3hk6x993yzeobeiqa5rh_bea562m9tvl_mtnafvtdndqoqrp1rvxmmgwjcx1lafwaaj4o8p8haq 41
 
0.2%
cjh1vbf94m8cfuelgqodyakhgq 29
 
0.1%
cj0keqiaw_debrchnyiq_562gsebeiqa4lcssmb_rwgvppnltzlzj5rgwqx5lk87wc5cjfcqznenzewaaiap8p8haq 27
 
0.1%
cjwkeaiaj7tcbrcp2z22ue-zrj4sjacg7sbejui6ggr6oca-edc2-lx7w1m5ia1c_qnbzwzvtquanxocb5rw_wcb 24
 
0.1%
cn_u9pavhdacfcnahgodtcqajw 22
 
0.1%
cjwkeaiaxkrfbrdm25f60oegtwwsjabgec-z0_dlpcxhm1ztqlr1ywewxu875yaqwupt7pgmgfezthoceezw_wcb 21
 
0.1%
cnhp7nf2ytmcfvlwdqod_iol5a 20
 
0.1%
cjwkeaiavs7cbrc24rao6bgcoiasjabact5dtalfxcossvr2e2aduhx6z6oe0kauvtqkzl-bcvn1-hocnlrw_wcb 20
 
0.1%
cj6xtee6j9acfqqdfgod8tkdnw 18
 
0.1%
Other values (17764) 21269
98.6%
2024-02-20T08:15:22.951767image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
A 71957
 
4.8%
C 62703
 
4.2%
w 52857
 
3.5%
B 47608
 
3.2%
E 42671
 
2.9%
Q 42294
 
2.8%
j 40337
 
2.7%
K 32775
 
2.2%
o 31969
 
2.1%
R 30844
 
2.1%
Other values (54) 1036776
69.5%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 686721
46.0%
Lowercase Letter 564972
37.8%
Decimal Number 193611
 
13.0%
Connector Punctuation 28864
 
1.9%
Dash Punctuation 18623
 
1.2%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
A 71957
 
10.5%
C 62703
 
9.1%
B 47608
 
6.9%
E 42671
 
6.2%
Q 42294
 
6.2%
K 32775
 
4.8%
R 30844
 
4.5%
D 26904
 
3.9%
I 24996
 
3.6%
P 23785
 
3.5%
Other values (16) 280184
40.8%
Lowercase Letter
ValueCountFrequency (%)
w 52857
 
9.4%
j 40337
 
7.1%
o 31969
 
5.7%
i 28639
 
5.1%
a 24959
 
4.4%
c 24188
 
4.3%
g 24027
 
4.3%
d 21898
 
3.9%
s 20665
 
3.7%
h 20152
 
3.6%
Other values (16) 275281
48.7%
Decimal Number
ValueCountFrequency (%)
8 30621
15.8%
0 24046
12.4%
9 18500
9.6%
4 18386
9.5%
7 18118
9.4%
6 17972
9.3%
3 17049
8.8%
2 16457
8.5%
5 16391
8.5%
1 16071
8.3%
Connector Punctuation
ValueCountFrequency (%)
_ 28864
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 18623
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 1251693
83.8%
Common 241098
 
16.2%

Most frequent character per script

Latin
ValueCountFrequency (%)
A 71957
 
5.7%
C 62703
 
5.0%
w 52857
 
4.2%
B 47608
 
3.8%
E 42671
 
3.4%
Q 42294
 
3.4%
j 40337
 
3.2%
K 32775
 
2.6%
o 31969
 
2.6%
R 30844
 
2.5%
Other values (42) 795678
63.6%
Common
ValueCountFrequency (%)
8 30621
12.7%
_ 28864
12.0%
0 24046
10.0%
- 18623
7.7%
9 18500
7.7%
4 18386
7.6%
7 18118
7.5%
6 17972
7.5%
3 17049
7.1%
2 16457
6.8%
Other values (2) 32462
13.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1492791
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
A 71957
 
4.8%
C 62703
 
4.2%
w 52857
 
3.5%
B 47608
 
3.2%
E 42671
 
2.9%
Q 42294
 
2.8%
j 40337
 
2.7%
K 32775
 
2.2%
o 31969
 
2.1%
R 30844
 
2.1%
Other values (54) 1036776
69.5%

adwordsClickInfo.adNetworkType
Categorical

HIGH CORRELATION  IMBALANCE  MISSING 

Distinct2
Distinct (%)< 0.1%
Missing882193
Missing (%)97.6%
Memory size35.1 MiB
Google Search
21453 
Search partners
 
7

Length

Max length15
Median length13
Mean length13.000652
Min length13

Characters and Unicode

Total characters278994
Distinct characters15
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowGoogle Search
2nd rowGoogle Search
3rd rowGoogle Search
4th rowGoogle Search
5th rowGoogle Search

Common Values

ValueCountFrequency (%)
Google Search 21453
 
2.4%
Search partners 7
 
< 0.1%
(Missing) 882193
97.6%

Length

2024-02-20T08:15:23.024587image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-02-20T08:15:23.091444image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
ValueCountFrequency (%)
search 21460
50.0%
google 21453
50.0%
partners 7
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
e 42920
15.4%
o 42906
15.4%
r 21474
7.7%
a 21467
7.7%
21460
7.7%
S 21460
7.7%
c 21460
7.7%
h 21460
7.7%
G 21453
7.7%
g 21453
7.7%
Other values (5) 21481
7.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 214621
76.9%
Uppercase Letter 42913
 
15.4%
Space Separator 21460
 
7.7%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 42920
20.0%
o 42906
20.0%
r 21474
10.0%
a 21467
10.0%
c 21460
10.0%
h 21460
10.0%
g 21453
10.0%
l 21453
10.0%
p 7
 
< 0.1%
t 7
 
< 0.1%
Other values (2) 14
 
< 0.1%
Uppercase Letter
ValueCountFrequency (%)
S 21460
50.0%
G 21453
50.0%
Space Separator
ValueCountFrequency (%)
21460
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 257534
92.3%
Common 21460
 
7.7%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 42920
16.7%
o 42906
16.7%
r 21474
8.3%
a 21467
8.3%
S 21460
8.3%
c 21460
8.3%
h 21460
8.3%
G 21453
8.3%
g 21453
8.3%
l 21453
8.3%
Other values (4) 28
 
< 0.1%
Common
ValueCountFrequency (%)
21460
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 278994
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 42920
15.4%
o 42906
15.4%
r 21474
7.7%
a 21467
7.7%
21460
7.7%
S 21460
7.7%
c 21460
7.7%
h 21460
7.7%
G 21453
7.7%
g 21453
7.7%
Other values (5) 21481
7.7%

adContent
Categorical

HIGH CORRELATION  MISSING 

Distinct44
Distinct (%)0.4%
Missing892707
Missing (%)98.8%
Memory size34.9 MiB
Google Merchandise Collection
5122 
Google Online Store
1245 
Display Ad created 3/11/14
967 
Full auto ad IMAGE ONLY
822 
Ad from 12/13/16
610 
Other values (39)
2180 

Length

Max length43
Median length34
Mean length25.126622
Min length8

Characters and Unicode

Total characters275036
Distinct characters63
Distinct categories9 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique5 ?
Unique (%)< 0.1%

Sample

1st rowFull auto ad IMAGE ONLY
2nd rowFirst Full Auto Template Test Ad
3rd row{KeyWord:Google Brand Items}
4th rowFull auto ad IMAGE ONLY
5th rowFull auto ad IMAGE ONLY

Common Values

ValueCountFrequency (%)
Google Merchandise Collection 5122
 
0.6%
Google Online Store 1245
 
0.1%
Display Ad created 3/11/14 967
 
0.1%
Full auto ad IMAGE ONLY 822
 
0.1%
Ad from 12/13/16 610
 
0.1%
Ad from 11/3/16 489
 
0.1%
Display Ad created 3/11/15 392
 
< 0.1%
{KeyWord:Google Brand Items} 251
 
< 0.1%
{KeyWord:Google Merchandise} 155
 
< 0.1%
Ad from 11/7/16 123
 
< 0.1%
Other values (34) 770
 
0.1%
(Missing) 892707
98.8%

Length

2024-02-20T08:15:23.153080image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
google 6657
18.6%
merchandise 5368
15.0%
collection 5122
14.3%
ad 3572
10.0%
display 1409
 
3.9%
created 1409
 
3.9%
store 1252
 
3.5%
online 1245
 
3.5%
from 1225
 
3.4%
3/11/14 967
 
2.7%
Other values (60) 7620
21.3%

Most occurring characters

ValueCountFrequency (%)
e 29938
 
10.9%
o 29217
 
10.6%
24900
 
9.1%
l 22124
 
8.0%
n 13612
 
4.9%
i 13600
 
4.9%
c 12045
 
4.4%
d 11551
 
4.2%
a 10739
 
3.9%
r 10693
 
3.9%
Other values (53) 96617
35.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 192416
70.0%
Uppercase Letter 35962
 
13.1%
Space Separator 24900
 
9.1%
Decimal Number 14087
 
5.1%
Other Punctuation 6172
 
2.2%
Open Punctuation 677
 
0.2%
Close Punctuation 677
 
0.2%
Connector Punctuation 110
 
< 0.1%
Dash Punctuation 35
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 29938
15.6%
o 29217
15.2%
l 22124
11.5%
n 13612
7.1%
i 13600
7.1%
c 12045
6.3%
d 11551
 
6.0%
a 10739
 
5.6%
r 10693
 
5.6%
t 9533
 
5.0%
Other values (12) 29364
15.3%
Uppercase Letter
ValueCountFrequency (%)
G 8112
22.6%
M 6221
17.3%
C 5126
14.3%
A 3652
10.2%
O 2096
 
5.8%
D 1482
 
4.1%
S 1400
 
3.9%
I 1083
 
3.0%
F 1031
 
2.9%
L 996
 
2.8%
Other values (12) 4763
13.2%
Decimal Number
ValueCountFrequency (%)
1 8015
56.9%
3 2458
 
17.4%
6 1222
 
8.7%
4 1017
 
7.2%
2 688
 
4.9%
5 433
 
3.1%
7 179
 
1.3%
0 75
 
0.5%
Other Punctuation
ValueCountFrequency (%)
/ 5276
85.5%
: 677
 
11.0%
? 111
 
1.8%
% 75
 
1.2%
' 31
 
0.5%
! 2
 
< 0.1%
Space Separator
ValueCountFrequency (%)
24900
100.0%
Open Punctuation
ValueCountFrequency (%)
{ 677
100.0%
Close Punctuation
ValueCountFrequency (%)
} 677
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 110
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 35
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 228378
83.0%
Common 46658
 
17.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 29938
13.1%
o 29217
12.8%
l 22124
 
9.7%
n 13612
 
6.0%
i 13600
 
6.0%
c 12045
 
5.3%
d 11551
 
5.1%
a 10739
 
4.7%
r 10693
 
4.7%
t 9533
 
4.2%
Other values (34) 65326
28.6%
Common
ValueCountFrequency (%)
24900
53.4%
1 8015
 
17.2%
/ 5276
 
11.3%
3 2458
 
5.3%
6 1222
 
2.6%
4 1017
 
2.2%
2 688
 
1.5%
{ 677
 
1.5%
: 677
 
1.5%
} 677
 
1.5%
Other values (9) 1051
 
2.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 275036
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 29938
 
10.9%
o 29217
 
10.6%
24900
 
9.1%
l 22124
 
8.0%
n 13612
 
4.9%
i 13600
 
4.9%
c 12045
 
4.4%
d 11551
 
4.2%
a 10739
 
3.9%
r 10693
 
3.9%
Other values (53) 96617
35.1%

conversion
Categorical

IMBALANCE 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size50.0 MiB
0
892138 
1
 
11515

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters903653
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0 892138
98.7%
1 11515
 
1.3%

Length

2024-02-20T08:15:23.212958image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-02-20T08:15:23.266602image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
ValueCountFrequency (%)
0 892138
98.7%
1 11515
 
1.3%

Most occurring characters

ValueCountFrequency (%)
0 892138
98.7%
1 11515
 
1.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 903653
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 892138
98.7%
1 11515
 
1.3%

Most occurring scripts

ValueCountFrequency (%)
Common 903653
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 892138
98.7%
1 11515
 
1.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 903653
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 892138
98.7%
1 11515
 
1.3%
Distinct366
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size6.9 MiB
Minimum2016-08-01 00:00:00
Maximum2017-08-01 00:00:00
2024-02-20T08:15:23.319340image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
2024-02-20T08:15:23.390922image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

Interactions

2024-02-20T08:15:12.337589image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
2024-02-20T08:15:09.571540image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
2024-02-20T08:15:10.203326image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
2024-02-20T08:15:10.805164image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
2024-02-20T08:15:11.397970image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
2024-02-20T08:15:11.946115image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
2024-02-20T08:15:12.404907image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
2024-02-20T08:15:09.685438image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
2024-02-20T08:15:10.317233image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
2024-02-20T08:15:10.918299image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
2024-02-20T08:15:11.515457image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
2024-02-20T08:15:12.009528image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
2024-02-20T08:15:12.478151image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
2024-02-20T08:15:09.819773image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
2024-02-20T08:15:10.430312image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
2024-02-20T08:15:11.029922image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
2024-02-20T08:15:11.637040image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
2024-02-20T08:15:12.078312image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
2024-02-20T08:15:12.546032image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
2024-02-20T08:15:09.951820image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
2024-02-20T08:15:10.554360image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
2024-02-20T08:15:11.148025image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
2024-02-20T08:15:11.749207image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
2024-02-20T08:15:12.142489image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
2024-02-20T08:15:12.617801image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
2024-02-20T08:15:10.020033image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
2024-02-20T08:15:10.619984image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
2024-02-20T08:15:11.210409image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
2024-02-20T08:15:11.811696image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
2024-02-20T08:15:12.204721image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
2024-02-20T08:15:12.681380image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
2024-02-20T08:15:10.086588image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
2024-02-20T08:15:10.687792image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
2024-02-20T08:15:11.277386image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
2024-02-20T08:15:11.873669image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
2024-02-20T08:15:12.269280image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/

Correlations

2024-02-20T08:15:23.465695image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
adContentadwordsClickInfo.adNetworkTypeadwordsClickInfo.pageadwordsClickInfo.slotcampaignchannelGroupingcontinentconversiondatedeviceCategoryhitsisMobilemediummetrooperatingSystempageviewssubContinenttransactionRevenuevisitNumber
adContent1.0000.4310.0760.3300.9790.9980.3270.0760.5590.2620.1730.3210.9980.0650.1740.1810.201-0.127-0.298
adwordsClickInfo.adNetworkType0.4311.000-0.0010.1080.3361.0000.0000.000-0.0080.000-0.0210.0001.0000.0000.000-0.0210.000NaN0.002
adwordsClickInfo.page0.076-0.0011.0000.1860.0051.0000.0000.000-0.0030.004-0.0320.0131.0000.0000.000-0.0320.000NaN-0.008
adwordsClickInfo.slot0.3300.1080.1861.0000.0981.0000.0130.0090.0150.1090.0820.0591.0000.0350.0700.0840.0000.0050.045
campaign0.9790.3360.0050.0981.0000.5160.0690.0190.0300.0680.0410.0940.5570.0450.0390.0410.060-0.0410.034
channelGrouping0.9981.0001.0001.0000.5161.0000.1880.131-0.2260.218-0.1130.3071.0000.1760.169-0.1110.2220.031-0.088
continent0.3270.0000.0000.0130.0690.1881.0000.109-0.0100.063-0.2260.0690.1310.2910.157-0.2271.0000.002-0.165
conversion0.0760.0000.0000.0090.0190.1310.1091.0000.0110.0450.1960.0450.0350.1230.0890.1990.123NaN0.109
date0.559-0.008-0.0030.0150.030-0.226-0.0100.0111.0000.145-0.0000.1450.2410.0950.1650.0050.212-0.0690.041
deviceCategory0.2620.0000.0040.1090.0680.2180.0630.0450.1451.000-0.0150.9990.2160.1050.715-0.0170.129-0.176-0.028
hits0.173-0.021-0.0320.0820.041-0.113-0.2260.196-0.000-0.0151.0000.0240.0070.0170.0160.9920.0220.2950.115
isMobile0.3210.0000.0130.0590.0940.3070.0690.0450.1450.9990.0241.0000.3040.1340.994-0.0180.167-0.176-0.029
medium0.9981.0001.0001.0000.5571.0000.1310.0350.2410.2160.0070.3041.0000.0950.145-0.0720.1690.034-0.053
metro0.0650.0000.0000.0350.0450.1760.2910.1230.0950.1050.0170.1340.0951.0000.091-0.0110.175-0.118-0.055
operatingSystem0.1740.0000.0000.0700.0390.1690.1570.0890.1650.7150.0160.9940.1450.0911.000-0.0680.116-0.067-0.074
pageviews0.181-0.021-0.0320.0840.041-0.111-0.2270.1990.005-0.0170.992-0.018-0.072-0.011-0.0681.0000.0160.2700.114
subContinent0.2010.0000.0000.0000.0600.2221.0000.1230.2120.1290.0220.1670.1690.1750.1160.0161.0000.027-0.110
transactionRevenue-0.127NaNNaN0.005-0.0410.0310.002NaN-0.069-0.1760.295-0.1760.034-0.118-0.0670.2700.0271.0000.218
visitNumber-0.2980.002-0.0080.0450.034-0.088-0.1650.1090.041-0.0280.115-0.029-0.053-0.055-0.0740.114-0.1100.2181.000

Missing values

2024-02-20T08:15:13.657129image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
A simple visualization of nullity by column.
2024-02-20T08:15:15.616655image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2024-02-20T08:15:19.114095image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

channelGroupingdatevisitNumbercontinentsubContinentcountryregionmetrocitybrowseroperatingSystemisMobiledeviceCategoryhitspageviewstransactionRevenuecampaignsourcemediumkeywordreferralPathadwordsClickInfo.pageadwordsClickInfo.slotadwordsClickInfo.gclIdadwordsClickInfo.adNetworkTypeadContentconversiondateColumn
0Organic Search201609021AsiaWestern AsiaTurkeyIzmir(not-set)IzmirChromeWindowsFalsedesktop11.0NaN(not set)googleorganic(not-provided)NaNNaNNaNNaNNaNNaN02016-09-02
1Organic Search201609021OceaniaAustralasiaAustralianot-available-in-demo-datasetnot-available-in-demo-datasetnot-available-in-demo-datasetFirefoxMacintoshFalsedesktop11.0NaN(not set)googleorganic(not-provided)NaNNaNNaNNaNNaNNaN02016-09-02
2Organic Search201609021EuropeSouthern EuropeSpainCommunity-of-Madrid(not-set)MadridChromeWindowsFalsedesktop11.0NaN(not set)googleorganic(not-provided)NaNNaNNaNNaNNaNNaN02016-09-02
3Organic Search201609021AsiaSoutheast AsiaIndonesianot-available-in-demo-datasetnot-available-in-demo-datasetnot-available-in-demo-datasetUC BrowserLinuxFalsedesktop11.0NaN(not set)googleorganicgoogle-+-onlineNaNNaNNaNNaNNaNNaN02016-09-02
4Organic Search201609022EuropeNorthern EuropeUnited-Kingdomnot-available-in-demo-datasetnot-available-in-demo-datasetnot-available-in-demo-datasetChromeAndroidTruemobile11.0NaN(not set)googleorganic(not-provided)NaNNaNNaNNaNNaNNaN02016-09-02
5Organic Search201609021EuropeSouthern EuropeItalynot-available-in-demo-datasetnot-available-in-demo-datasetnot-available-in-demo-datasetChromeWindowsFalsedesktop11.0NaN(not set)googleorganic(not-provided)NaNNaNNaNNaNNaNNaN02016-09-02
6Organic Search201609021AsiaSouthern AsiaPakistannot-available-in-demo-datasetnot-available-in-demo-datasetnot-available-in-demo-datasetChromeWindowsFalsedesktop11.0NaN(not set)googleorganic(not-provided)NaNNaNNaNNaNNaNNaN02016-09-02
7Organic Search201609021OceaniaAustralasiaAustraliaQueensland(not-set)BrisbaneChromeWindowsFalsedesktop11.0NaN(not set)googleorganic(not-provided)NaNNaNNaNNaNNaNNaN02016-09-02
8Organic Search201609021EuropeWestern EuropeAustrianot-available-in-demo-datasetnot-available-in-demo-datasetnot-available-in-demo-datasetInternet ExplorerWindowsFalsedesktop11.0NaN(not set)googleorganic(not-provided)NaNNaNNaNNaNNaNNaN02016-09-02
9Organic Search201609021EuropeWestern EuropeNetherlandsnot-available-in-demo-datasetnot-available-in-demo-datasetnot-available-in-demo-datasetFirefoxWindowsFalsedesktop11.0NaN(not set)googleorganic(not-provided)NaNNaNNaNNaNNaNNaN02016-09-02
channelGroupingdatevisitNumbercontinentsubContinentcountryregionmetrocitybrowseroperatingSystemisMobiledeviceCategoryhitspageviewstransactionRevenuecampaignsourcemediumkeywordreferralPathadwordsClickInfo.pageadwordsClickInfo.slotadwordsClickInfo.gclIdadwordsClickInfo.adNetworkTypeadContentconversiondateColumn
903643Social201701044AmericasNorthern AmericaUnited-StatesCaliforniaSan-Francisco-Oakland-San-Jose-CAFremontChromeMacintoshFalsedesktop1110.0NaN(not set)groups.google.comreferralNaN/a/google.com/forum/NaNNaNNaNNaNNaN02017-01-04
903644Social201701041AmericasNorthern AmericaUnited-Statesnot-available-in-demo-datasetnot-available-in-demo-datasetnot-available-in-demo-datasetChromeiOSTruetablet117.0NaN(not set)m.youtube.comreferralNaN/watchNaNNaNNaNNaNNaN02017-01-04
903645Social201701041AmericasNorthern AmericaUnited-StatesNew-YorkNew-York-NYNew-YorkSafari (in-app)iOSTruemobile118.0NaN(not set)m.facebook.comreferralNaN/NaNNaNNaNNaNNaN02017-01-04
903646Social201701041OceaniaAustralasiaAustraliaVictoria(not-set)MelbourneChromeiOSTruetablet1512.0NaN(not set)youtube.comreferralNaN/yt/about/NaNNaNNaNNaNNaN02017-01-04
903647Social201701041AfricaNorthern AfricaEgyptnot-available-in-demo-datasetnot-available-in-demo-datasetnot-available-in-demo-datasetChromeWindowsFalsedesktop1611.0NaN(not set)youtube.comreferralNaN/yt/about/ar/NaNNaNNaNNaNNaN02017-01-04
903648Social201701041AmericasCaribbeanPuerto-Riconot-available-in-demo-datasetnot-available-in-demo-datasetnot-available-in-demo-datasetChromeWindowsFalsedesktop1715.0NaN(not set)youtube.comreferralNaN/yt/about/NaNNaNNaNNaNNaN02017-01-04
903649Social201701041AsiaSouthern AsiaSri-Lankanot-available-in-demo-datasetnot-available-in-demo-datasetnot-available-in-demo-datasetChromeAndroidTruemobile1813.0NaN(not set)youtube.comreferralNaN/yt/about/NaNNaNNaNNaNNaN02017-01-04
903650Social201701041AsiaEastern AsiaSouth-KoreaSeoul(not-set)SeoulAndroid WebviewAndroidTruemobile2421.0NaN(not set)youtube.comreferralNaN/yt/about/ko/NaNNaNNaNNaNNaN02017-01-04
903651Social201701041AsiaSoutheast AsiaIndonesianot-available-in-demo-datasetnot-available-in-demo-datasetnot-available-in-demo-datasetChromeWindowsFalsedesktop2422.0NaN(not set)facebook.comreferralNaN/l.phpNaNNaNNaNNaNNaN02017-01-04
903652Social201701041AmericasCentral AmericaMexiconot-available-in-demo-datasetnot-available-in-demo-datasetnot-available-in-demo-datasetChromeAndroidTruemobile3131.0NaN(not set)youtube.comreferralNaN/yt/about/es-419/NaNNaNNaNNaNNaN02017-01-04

Duplicate rows

Most frequently occurring

channelGroupingdatevisitNumbercontinentsubContinentcountryregionmetrocitybrowseroperatingSystemisMobiledeviceCategoryhitspageviewstransactionRevenuecampaignsourcemediumkeywordreferralPathadwordsClickInfo.pageadwordsClickInfo.slotadwordsClickInfo.gclIdadwordsClickInfo.adNetworkTypeadContentconversiondateColumn# duplicates
22236Organic Search201703231AsiaSouthern AsiaIndiaTamil-Nadu(not-set)ErodeChromeWindowsFalsedesktop11.0NaN(not set)googleorganic(not-provided)NaNNaNNaNNaNNaNNaN02017-03-2368
33865Organic Search201706301AmericasNorthern AmericaUnited-Statesnot-available-in-demo-datasetnot-available-in-demo-datasetnot-available-in-demo-datasetSafariiOSTruemobile11.0NaN(not set)googleorganic(not-provided)NaNNaNNaNNaNNaNNaN02017-06-3055
29989Organic Search201706011AmericasNorthern AmericaUnited-Statesnot-available-in-demo-datasetnot-available-in-demo-datasetnot-available-in-demo-datasetChromeWindowsFalsedesktop11.0NaN(not set)googleorganic(not-provided)NaNNaNNaNNaNNaNNaN02017-06-0151
34006Organic Search201707011AmericasNorthern AmericaUnited-Statesnot-available-in-demo-datasetnot-available-in-demo-datasetnot-available-in-demo-datasetSafariiOSTruemobile11.0NaN(not set)googleorganic(not-provided)NaNNaNNaNNaNNaNNaN02017-07-0148
29975Organic Search201706011AmericasNorthern AmericaUnited-Statesnot-available-in-demo-datasetnot-available-in-demo-datasetnot-available-in-demo-datasetChromeAndroidTruemobile11.0NaN(not set)googleorganic(not-provided)NaNNaNNaNNaNNaNNaN02017-06-0139
30002Organic Search201706011AmericasNorthern AmericaUnited-Statesnot-available-in-demo-datasetnot-available-in-demo-datasetnot-available-in-demo-datasetSafariiOSTruemobile11.0NaN(not set)googleorganic(not-provided)NaNNaNNaNNaNNaNNaN02017-06-0139
30161Organic Search201706021AmericasNorthern AmericaUnited-Statesnot-available-in-demo-datasetnot-available-in-demo-datasetnot-available-in-demo-datasetChromeWindowsFalsedesktop11.0NaN(not set)googleorganic(not-provided)NaNNaNNaNNaNNaNNaN02017-06-0239
35063Organic Search201707101AmericasNorthern AmericaUnited-Statesnot-available-in-demo-datasetnot-available-in-demo-datasetnot-available-in-demo-datasetSafariiOSTruemobile11.0NaN(not set)googleorganic(not-provided)NaNNaNNaNNaNNaNNaN02017-07-1039
36775Organic Search201707231AmericasNorthern AmericaUnited-Statesnot-available-in-demo-datasetnot-available-in-demo-datasetnot-available-in-demo-datasetSafariiOSTruemobile11.0NaN(not set)googleorganic(not-provided)NaNNaNNaNNaNNaNNaN02017-07-2337
37183Organic Search201707261AmericasNorthern AmericaUnited-Statesnot-available-in-demo-datasetnot-available-in-demo-datasetnot-available-in-demo-datasetSafariiOSTruemobile11.0NaN(not set)googleorganic(not-provided)NaNNaNNaNNaNNaNNaN02017-07-2637